<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Code, Life and Learning</title>
	<atom:link href="http://www.briandrought.com/brianblog/index.php?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.briandrought.com/blog</link>
	<description>Powered by Incompetence</description>
	<lastBuildDate>Fri, 30 Jul 2010 18:03:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>viewmessages.com Architecture</title>
		<link>http://www.briandrought.com/blog/?p=98</link>
		<comments>http://www.briandrought.com/blog/?p=98#comments</comments>
		<pubDate>Fri, 30 Jul 2010 17:04:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=98</guid>
		<description><![CDATA[<p>OK, so we’re not the biggest site in the world but we have a fair amount of data, a fair amount of users and speed is very important to me so it’s important everything is as fast as possible. A few people have asked what our architecture is and I thought it’d make an interesting [...]]]></description>
			<content:encoded><![CDATA[<p>OK, so we’re not the biggest site in the world but we have a fair amount of data, a fair amount of users and speed is very important to me so it’s important everything is as fast as possible. A few people have asked what our architecture is and I thought it’d make an interesting post. As is always the way with these things it’s easier to describe with a diagram:</p>
<p><a href="http://www.briandrought.com/blog/wp-content/uploads/2010/07/PearlServerArchitecture.png"><img style="display: inline; border: 0px;" title="PearlServerArchitecture" src="http://www.briandrought.com/blog/wp-content/uploads/2010/07/PearlServerArchitecture_thumb.png" border="0" alt="PearlServerArchitecture" width="801" height="772" /></a></p>
<h2>Web Servers</h2>
<h3>content.viewmessages.com</h3>
<p>First of all, we serve images/bulky javascript and CSS from Amazon Cloudfront CDN which is an incredibly cheap way of offloading those things to the Amazon infrastructure. It also makes the platform much much snapper for our <a href="http://blog.jam.co.uk/?p=233" target="_blank">American users</a>. If you even have a basic website it’s worth looking into using Cloudfront if only because it gives you a second domain to pull you data from which allows the browser to parallelise more downloads.</p>
<h3>totem.viewmessages.com</h3>
<p>Totem is my own long polling server I developed to allow instant communication to the users browser. This allows things like instant new message notification. In short, your browser uses JQuery to request a script from Totem. If there’s no new messages, Totem will sit there for 40 seconds and return nothing. Your browser will then re-request the script and wait for another 40 seconds. If you get sent a U2U for example 5 seconds into the 40 seconds, the web server/background server dealing with the U2U sends a notification to Totem which creates a bit of Javascript to display the U2U notification and sends it back as the response to the original request that was made 5 seconds previously. For more on Totem, read my <a href="http://www.briandrought.com/blog/?p=56" target="_blank">Project Totem</a> blog post.</p>
<h3>static.viewmessages.com</h3>
<p>Because we use a web cluster to serve the main HTML we need a central server for avatars and other central data that we don’t push to Cloudfront. The only challenge here was getting content to it. Security in IIS from the main webcluster meant I couldn’t access the machine directly to I had to <a href="http://www.briandrought.com/blog/?p=41" target="_blank">use a SQL database as a proxy</a></p>
<h3>www.viewmessages.com</h3>
<p>The main web serving is done by a cluster of IIS machines. These are cheap commodity machines in the Google style. 2GB Ram/2GHz Dual Core CPU/80GB drive. Nothing fancy or expensive. By using multiple cheap machines instead of one big expensive one we get vastly better availability (they can be brought offline for updating), far better performance (if you add up the total computing power) at less cost. It&#8217;s a win-win other than it makes the software development slightly more complex at times.</p>
<p>Each machine runs a copy of SQL Express to write access logs to (Which are then copied to the main SQL box when things are quiet) and to store a whole bunch of reasonably static information (such as configuration) to reduce the load on the main SQL box. Each machine can do front end web serving, back end task processing or both. As we need more capacity we can simply add more machines. The load balancers will send the users request to a particular web server using a session cookie. If the server goes down, the failover happens within 10 seconds and you’ll be transparently placed onto a different server. </p>
<p>The back end task processing is something I’m particularly pleased with as it allows the processing load to be distributed across as many machines as we need. At the moment these are the same machines that serve the front end web stuff but at a later date will be split off into a dedicated back end cluster. All the back end processing is done by requesting webpages from a queue. If you want to read about how we process background tasks <a href="http://www.briandrought.com/blog/?p=14" target="_blank">heres my blog post about it</a></p>
<h2>Background Servers</h2>
<h3> Background / Offline Processing</h3>
<p>As mentioned above, this is done using queues of webpages and is processed by the main web cluster</p>
<h3>Main SQL Store</h3>
<p>Nothing interesting here really I’m afraid. Just a reasonably beefy Dell machine with data replicated to a hotspare backup.</p>
<h3>Solr Server</h3>
<p>I’m now using <a href="http://lucene.apache.org/solr/" target="_blank">Solr</a> to generate the data for the new <a href="http://blog.jam.co.uk/?p=274" target="_blank">Message Analytics Feature</a> . I’ll do a blog post about it at some point in the future but it’s incredibly fast compared to using XML data with SQL. Doing a ‘Group By’ on an XML value in SQL was taking around 1200ms for a particular data set (with an unloaded server). Using Solr on a *much* less powerful machine took 20ms. It’s an incredible piece of software if slightly tricky to use.</p>
<h3>Memcached</h3>
<p>The staple of every high performance website. Memcached is a memory based data store. I don’t use it to store reasonably static data as that’s done in the ASP.Net cache object (which is 10x quicker due to it being on the machine itself), but I use Memcached to store precompiled data that’s used across machines. For example, if you get sent a U2U it’s a background task that ‘delivers’ it to your inbox. This task puts the message in your inbox, adds it to the search database, then takes the most recent 10 U2U’s for you and recompiles the HTML you see in your ‘Recent U2U Messages’ widget on your homepage and inserts it into Memcached.  The background task then notifies the Totem server about the U2U, Totem notifies your browser, your browser requests the new HTML blob back from the webserver and guess what? It’s already been generated and the webserver just grabs it from Memcached. The beauty of using it over the ASP.Net cache is that cached objects can be shared across machines.</p>
<p>Memcached is a great bit of software and we’ve had absolutely zero issues with it.  The current stats from our memcached instance are below:</p>
<p><span style="font-family: Courier New; font-size: x-small;">STAT uptime 28625365 <strong>(nearly a year)<br />
</strong>STAT time 1280509032<br />
STAT pointer_size 32<br />
STAT curr_items 30626   (<strong>It’s 100,000 or so during busy periods)<br />
</strong>STAT total_items 10108777<br />
STAT bytes 9450225<br />
STAT curr_connections 17<br />
STAT total_connections 10040<br />
STAT connection_structures 24<br />
STAT cmd_get 39701711<br />
STAT cmd_set 10108777<br />
STAT get_hits 33158267  (<strong>It’s saved a LOT of SQL reads!)<br />
</strong>STAT get_misses 6543444<br />
STAT bytes_read 3126086257<br />
STAT bytes_written 821258193  <strong>(It’s served 800GB!)<br />
</strong>STAT limit_maxbytes 524288000</span></p>
<p>And to think, I was almost tempted to use Velocity instead. <a href="http://www.briandrought.com/blog/?p=29" target="_blank">You can read why I didn&#8217;t</a>.</p>
<h2>Summary</h2>
<p>By applying a bit of thought and leveraging the right technology for each part of the puzzle we&#8217;ve got a platform that *way* outperforms a traditional single big webserver setup. We also have minimal load on the main SQL box by using quite aggressive caching (In memory on the local webserver, in Memcached and in SQL Express on the local webserver).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=98</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hot air extraction &#8211; more efficient server room cooling</title>
		<link>http://www.briandrought.com/blog/?p=91</link>
		<comments>http://www.briandrought.com/blog/?p=91#comments</comments>
		<pubDate>Fri, 28 May 2010 14:36:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=91</guid>
		<description><![CDATA[<p> </p>
<p>In addition to the power for servers, a huge cost we have is cooling them. 6kW of servers is going to require some chilling. Our data room has air conditioning and it works very hard for a living particularly in summer where the heat differential on the aircon exchanger outside is lower. In a big [...]]]></description>
			<content:encoded><![CDATA[<p> </p>
<p>In addition to the power for servers, a huge cost we have is cooling them. 6kW of servers is going to require some chilling. Our data room has air conditioning and it works very hard for a living particularly in summer where the heat differential on the aircon exchanger outside is lower. In a big data centre you’d pump chilled air into a ‘cold aisle’ in front of a load of racks, and then have a ‘hot aisle’ behind them where you suck the air back into the A/C. Unfortunately our building wasn’t designed with this in mind so we simply have a wall mounted unit that cools the whole room. The problem with cooling the room though, is there’s no way of making sure the servers see chilled air, they might get air that has come directly from the back of the rack and sucked back round again.</p>
<p>Whilst doing some tidying up I spotted one of our old extraction fans from years gone by. When we were a much smaller company, air was drawn into the room at one end and extracted at the other end. It kept things cool enough until we started to need more equipment and then A/C was the only option.</p>
<p>Anyway, below, you can see the unused fan and our main server rack to the left.</p>
<p><a href="http://www.briandrought.com/blog/wp-content/uploads/2010/05/freshaircooling1.jpg"><img style="display: inline; border: 0px;" title="freshaircooling1" src="http://www.briandrought.com/blog/wp-content/uploads/2010/05/freshaircooling1_thumb.jpg" border="0" alt="freshaircooling1" width="644" height="335" /></a></p>
<p>There are probably some very expensive hot air extraction systems on the market, but I figured there was no point in spending a lot of cash to trial it out. B&amp;Q to the rescue for some gaffa tape and guttering pipe. Add in the old box from my <a href="http://www.hermanmiller.com/Products/Mirra-Chairs" target="_blank">Herman Miller &#8216;Mirra&#8217;</a> chair, and an hour of creativity and we have a working hot air extraction system…..</p>
<p><a href="http://www.briandrought.com/blog/wp-content/uploads/2010/05/freshaircooling2.jpg"><img style="display: inline; border: 0px;" title="freshaircooling2" src="http://www.briandrought.com/blog/wp-content/uploads/2010/05/freshaircooling2_thumb.jpg" border="0" alt="freshaircooling2" width="644" height="450" /></a></p>
<p>I simply made a baffle infront of the fan and added ducting that goes down behind the server. It’s not pretty, but it does work:</p>
<p><a href="http://www.briandrought.com/blog/wp-content/uploads/2010/05/freshaircoolinggraph.jpg"><img style="display: inline; border: 0px;" title="freshaircoolinggraph" src="http://www.briandrought.com/blog/wp-content/uploads/2010/05/freshaircoolinggraph_thumb.jpg" border="0" alt="freshaircoolinggraph" width="634" height="484" /></a></p>
<p>There… proof it works! We dropped the temp measured at the top of the rack by a degree. Air intake temps on the servers lowered even more. Our SQL server was drawing in 24 degree air previously, and is now a lot more chilled. (21 degrees!). The UPS unit on the floor beside the rack had a similar drop from 25deg to 22deg.</p>
<p>We’ve massively reduced the strain on our air con unit for the grand sum of about £50 and the overhead of a 100W fan. (which is more than offset by the potential savings in air con for that room)</p>
<p>The next thing to try is adding curtains from the side of the rack to the wall to force the hot air into the extracted area.</p>
<p><object width="640" height="385"><param name="movie" value="http://www.youtube.com/v/zRwPSFpLX8I&amp;hl=en_GB&amp;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/zRwPSFpLX8I&amp;hl=en_GB&amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=91</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick book review: Leaving Microsoft to Change the World</title>
		<link>http://www.briandrought.com/blog/?p=84</link>
		<comments>http://www.briandrought.com/blog/?p=84#comments</comments>
		<pubDate>Wed, 19 May 2010 14:23:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=84</guid>
		<description><![CDATA[<p>&#160;</p>
<p></p>
<p>&#160;</p>
<p>http://www.amazon.co.uk/Leaving-Microsoft-Change-World-Entrepreneurs/dp/0007237030/</p>
<p>&#160;</p>
<p>My rating: 9/10</p>
<p>A great read overall… not quite Three Cups Of Tea, but inspiring nonetheless. Unlike Greg Mortenson, John Wood started out from a very strong position as a senior exec at Microsoft. It’s fascinating to see how he uses lessons from his past life working with highly driven people like Steve Ballmer to create [...]]]></description>
			<content:encoded><![CDATA[<p>&#160;</p>
<p><img src="http://ecx.images-amazon.com/images/I/51Sxr-hgOLL._SL500_AA300_.jpg" /></p>
<p>&#160;</p>
<p><a title="http://www.amazon.co.uk/Leaving-Microsoft-Change-World-Entrepreneurs/dp/0007237030/" href="http://www.amazon.co.uk/Leaving-Microsoft-Change-World-Entrepreneurs/dp/0007237030/">http://www.amazon.co.uk/Leaving-Microsoft-Change-World-Entrepreneurs/dp/0007237030/</a></p>
<p>&#160;</p>
<p>My rating: 9/10</p>
<p>A great read overall… not quite <a href="http://www.amazon.co.uk/Three-Cups-Tea-Greg-Mortenson/dp/0141034262/" target="_blank">Three Cups Of Tea</a>, but inspiring nonetheless. Unlike Greg Mortenson, John Wood started out from a very strong position as a senior exec at Microsoft. It’s fascinating to see how he uses lessons from his past life working with highly driven people like Steve Ballmer to create a non profit that has improved education for more than 4 million children in Bangladesh, Cambodia, India, Laos, Nepal, South Africa, Sri Lanka, Vietnam and Zambia.</p>
<p>As well as making you want to jack it all in for something more meaningful, it’s got some half decent business lessons in there. </p>
<p>If you had to make the choice, I’d go with Three Cups of Tea every time, but this is still a cracker.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=84</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Re-inventing the spell checker</title>
		<link>http://www.briandrought.com/blog/?p=66</link>
		<comments>http://www.briandrought.com/blog/?p=66#comments</comments>
		<pubDate>Wed, 05 May 2010 14:58:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=66</guid>
		<description><![CDATA[Background
<p>Our system does a &#8216;review&#8217; of messages after our operators save them. It checks for things like fields not being filled in where they normally are, but most importantly it checks for spelling mistakes and typos.</p>
<p>We used to use the Telerik Radspell spell checker component in a back end web service. It worked adequately but [...]]]></description>
			<content:encoded><![CDATA[<h3>Background</h3>
<p>Our system does a &#8216;review&#8217; of messages after our operators save them. It checks for things like fields not being filled in where they normally are, but most importantly it checks for spelling mistakes and typos.</p>
<p>We used to use the <a href="http://www.telerik.com/products/aspnet-ajax/spell.aspx" target="_blank">Telerik Radspell</a> spell checker component in a back end web service. It worked adequately but it had a limited dictionary, didn&#8217;t know the Queen&#8217;s English (it uses American spellings) and the suggested corrections were often a bit&#8230;&#8230; random as you can see from the screenshot below. The word column contains the supplied misspelling and the subsequent columns are the suggestions (in order).</p>
<p>&#160;</p>
<p><a href="http://www.briandrought.com/blog/wp-content/uploads/2010/05/spellcheckbefore.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="spellcheckbefore" border="0" alt="spellcheckbefore" src="http://www.briandrought.com/blog/wp-content/uploads/2010/05/spellcheckbefore_thumb.png" width="327" height="380" /></a> </p>
<p>&#160;</p>
<h4>How does an average spell checker work?</h4>
<p>It&#8217;s pretty simple to make a crude spell checker and all you need is a dictionary of correct words. You take each word and check if it exists in the dictionary. If not, you then loop through the dictionary seeing how different each correct word is to the supplied misspelled word. There&#8217;s a well used algorithm for seeing how different words are. This is called the <a href="http://en.wikipedia.org/wiki/Levenshtein_distance" target="_blank">Levenstein distance</a>, or &#8216;edit distance&#8217;. Each addition/subtraction/substitution counts as an &#8216;edit&#8217; For instance, take the misspelling of &#8216;hosspitle&#8217;</p>
<p style="padding-left: 30px"><font color="#800000" face="Courier New">hosspitle -&gt; hospitle      <br />hospitle -&gt; hospitae       <br />hospitae -&gt; hospital</font></p>
<p>That&#8217;s an edit distance of 3. The lower the Levenstein distance the more the words are alike.</p>
<p>There&#8217;s a slight snag with doing it this way though. If you have even a small dictionary of say 10,000 words you&#8217;d need to compare each of the 10,000 words to your misspelling. There&#8217;s no real way of pre-computing it as you can&#8217;t possibly cater for all misspellings. It&#8217;d be quite a costly computational exercise. We can get a much much smaller subset of words to compare by selecting them based on a phonetic algorithm. The most common of which is <a href="http://en.wikipedia.org/wiki/Soundex" target="_blank">soundex</a>. This way we can pre-compute the soundex code for all of our known good words. </p>
<p>For example, you can get the soundex value for ‘hosspitle’ using SQL2008 by doing select soundex(&#8216;hosspitle&#8217;). This gives a value of H213. If I check the soundex of ‘hospital’ I also get H213. This means that the correct result would be in the subset which is a good start!</p>
<h3>Spell Check 2.0&#8230;&#8230;.</h3>
<h4>Why?</h4>
<p>Because crappy looking messages with spelling mistakes and typos don&#8217;t give the client a sense of professionalism. The previous spelling corrector called wolf a bit too often and I found that a lot of operators would get used to ignoring it. Also, if it didn&#8217;t list the correct suggestion first time around it took them a while to go back into the message, correct the word and resend&#8230; there was a temptation to just ignore the mistake and send it anyway.</p>
<h4>Improvement #1 &#8211; Junking the Telerik engine.</h4>
<p>First step is to reproduce what the Telerik spell checker does so I can start to develop my own system. This turned out to be pretty easy. Just find a dictionary of English words on the tinterweb, upload to SQL, create a column for a Soundex field and use the built in SQL Soundex function to pre-compute the soundex’s by doing “update englishwords set soundexvalue = soundex(word)”. </p>
<p>You can then select your word subset back by doing something like “select * from words where soundexvalue = soundex(@mywrongword)”. Using the ‘hosspitle’ example, my dictionary gives me 47 records including </p>
<p>&#160;</p>
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="200">hagbut          <br />hasped           <br />hispid           <br />hackbut           <br />hagbuts           <br />hawkbit           <br />hexapod           <br />hackbuts           <br />hawkbits           <br />hexapods           <br />hexapody           <br />hiccuped</td>
<td valign="top" width="200">hospital          <br />hagbuteer           <br />hagbutter           <br />hiccupped           <br />hispidity           <br />hospitals           <br />hospitium           <br />houseboat           <br />housebote           <br />hospitalizing           <br />hospitableness           <br />hospitalization</td>
</tr>
<tr>
<td valign="top" width="200">&#160;</td>
<td valign="top" width="200">&#160;</td>
</tr>
</tbody>
</table>
<p>(No, I don’t know what half of those words are either!)</p>
<p>Anyway, once I get our subset I order it by Edit Distance, so hospital will come amongst the first few search results. This gave me exactly the same results as the Telerik engine and therefore a decent baseline to work from…..</p>
<p>&#160;</p>
<h4>Improvement #2 – Using a bigger dictionary</h4>
<p>Bigger is better most of the time, and I needed more words. Searching the internet for a while I found some decent sized CSV’s which had lists of words along with the number of occurrences that word has been seen (this will be useful later). I uploaded this in exactly the same way as improvement 1, into a table called BigDictionary but with a field for the occurrences. The system now uses the previous English words dictionary just to check if it’s a valid word. If I don’t see it in the table I then use the BigDictionary table to retrieve a list of possibilities. </p>
<h4>Improvement #3 – Learn words itself</h4>
<p>If the spelling corrector uses a fixed dictionary, it doesn’t have a hope of keeping track with the modern world. For example, just looking at the ‘wrong’ words being flagged up by the system as it was at stage 2 I could see it was probably annoying operators. It had flagged up words such as Skype, Mercedes, Google, Ferrari, Bosch, Microsoft, Nokia etc. I wrote a small routine to go through two years worth of messages, separate out each word and upload it to a table. If the word was already in the table, I incremented an ‘occurrences’ field (again, this will be useful later!). I set the system to gate the results so uncommon words don’t appear in the suggestions. This helps to stop any misspellings being learnt as valid words. </p>
<p>I check the BigDictionary table for suggestions, then the LearntWords table and aggregate the suggestions before sorting by edit distance. </p>
<h4>Improvement #4 – Double Metaphone</h4>
<p>The soundex algorithm is pretty basic and it’s totally reliant on the first letter being correct. This meant that the pre-computed subset was often a bit limited and wouldn’t contain the correct result.&#160; After doing a big of research into phonetic algorithms it seemed like <a href="http://en.wikipedia.org/wiki/Double_Metaphone" target="_blank">Double Metaphone</a> was a good bet and a fair bit more advanced than soundex. I created a Primary and Secondary Metaphone field for all of my dictionary tables so far (including the learnt words table) and made a script to calculate the primary and secondary metaphone values for every word. After an hour of it grinding away I had precomputed values for metaphone as well as soundex. I changed my SQL queries to something like <font face="Courier New">Select * from dictionary where (pm = @pm or sm=@sm or </font><a href="mailto:soundex=@soundex"><font face="Courier New">soundex=@soundex</font></a><font face="Courier New">).</font> This instantly made the results set bigger and it seemed to get a few more hits particularly if the typo was early on in the word.</p>
<h4>Improvement #5 – Weight by Frequency as well as Edit Distance</h4>
<p>If you look at the screenshot of the initial results you’ll see the Telerik checker suggested ‘darvon’, ‘driven’, ‘thriven’ for the typo ‘dirven’. This is because it has no idea how common a word is, and it just so happens than ‘darvon’ has the same edit distance from ‘dirven’ as ‘driven’. I have absolutely no idea what a darvon is, and I suspect neither would our callers. Fortunately, in the BigDictionary and my LearntWords tables I have an integer field essentially telling me how common that word is. I decided against simply using the count as a multiplier of a ‘relevancy’ as some words are hugely more common than others and would overwhelm&#160; the edit distance… for example if you put ‘thene’ instead of ‘theme’, you’d find that it’d suggest ‘the’ as it’s vastly more common than theme, or even them and then. Instead, I used the ‘position’ as the multiplier, so my SQL became something like: </p>
<pre class="csharpcode">

<span class="kwrd">Select</span> * <span class="kwrd">from</span> <span class="kwrd">dictionary</span> <span class="kwrd">where</span> (pm = @pm <span class="kwrd">or</span> sm=@sm <span class="kwrd">or</span> soundex=@soundex) <span class="kwrd">order</span> <span class="kwrd">by</span> wordcount desc
</pre>
<p>I then take the results in and do something like:</p>
<pre class="csharpcode"><span class="kwrd">For</span> <span class="kwrd">Each</span> Result
    Position += 1
    Score = Position * EditDistance(Result,Word)
<span class="kwrd">Next</span></pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>The lower the score, the more relevant the results. </p>
<p>&#160;</p>
<h4>Improvement #6 – Learn from our mistakes</h4>
<p>Looking through the log of mistakes and corrections it seemed to be that the same ones were coming up again and again, for example ‘Plesae’ being changed to ‘Please’. It’s pretty obvious, but the system should look at what’s been corrected for that same mistake and bring up the correction in the results. To recap, the process we’re now doing is:</p>
<ol>
<li>Check EnglishWords table to see if it’s a common word </li>
<li>Check LearntMistakes to see if we’ve seen the mistake before, if so, load in the corrections into an array of suggestions </li>
<li>Search LearntWords by Soundex and Double Metaphone to see any soundalike word we’ve seen before in a previous message </li>
<li>Search BigDictionary by Soundex and Double Metaphone to see any soundalike words that are in the dictionary </li>
<li>Score all suggestions retrieved by Edit Distance and Position </li>
</ol>
<p>&#160;</p>
<h4>Improvement #7 – Weight by source</h4>
<p>Now we’re pulling in previous corrections, it’s pretty obvious that some sources are more relevant than others. For example, if I’ve seen ‘plesae’ changed to ‘please’ 80 times, it’s a fair bet when I next see ‘plesae’ they didn’t mean ‘police’, ‘palace’ etc. So, our array of suggestions that is being filled by our LearntMistakes, LearntWords and BigDictionary suggestions now gains a source column, and our weighting code becomes something like:</p>
<pre class="csharpcode"><span class="kwrd">For</span> <span class="kwrd">Each</span> Result

    <span class="kwrd">Select</span> <span class="kwrd">Case</span> Source
       <span class="kwrd">Case</span> PreviousCorrections
          SourceWeight = 10
       <span class="kwrd">Case</span> LearntWords
          SourceWeight = 15
       <span class="kwrd">Case</span> BigDictionary
          SourceWeight = 20
    <span class="kwrd">End</span> <span class="kwrd">Select</span>

    Position += 1
    Score = Position * EditDistance(Result,Word) * SourceWeight
<span class="kwrd">Next</span></pre>
<p>
<style type="text/css">
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>Again, the lower the weight, the higher the relevance.</p>
<p>&#160;</p>
<h4>Improvement #8 – Learn words by client not just globally</h4>
<p>Some of our clients have industry specific words, for example, if someone phones up to book a car with <a href="http://www.youtube.com/watch?v=VJ7wlBTcPJA" target="_blank">Supercar Experiences</a> and we see the typo Miserati it’s pretty likely the operator meant Maserati and not Miserable. When I processed the previously seen words from the last few years, I actually created two tables. One that was global across all clients, and one that had a client code on each row, i.e. treat the learnt words separately per company. I use a much lower threshold on this table so the system is quicker to allow learnt words into the suggestions than on the global table. This is purely because any wrong words that get learnt will only appear in suggestions for that company and won’t poison the global dictionaries.</p>
<p>&#160;</p>
<h4>Improvement #9 – Treat transpositions differently</h4>
<p>One snag with the Levenshtein distance algorithm as it has no way of detecting transpositions, so ‘ditsance’ is an edit distance of 2 from ‘distance’. Changing to the <a href="http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance" target="_blank">Damerau–Levenshtein distance</a> algorithm changes that and seemed to massively improve results where it was just a transposition. </p>
<p>&#160;</p>
<h4>Improvement #10 – Context </h4>
<p>This is my favourite part……! By now the system is getting pretty smart and the number of messages going out with mistakes is falling rapidly (I re-analyse every message that’s sent after the review process so I can count word errors) but there’s still something missing and sometimes it seems a bit woeful compared to the human brain. We’re pretty good at reading typos and half the time our brain has corrected the word without us noticing.. this is because we know what word to expect. The computer however, doesn’t.</p>
<p>Consider the following sentence:&#160; “sending info regarding meeting she had <strong>witrh</strong> you last month”. We can see they clearly meant with, but the computer has no idea and has to evaluate it without context. </p>
<p>I fed the system a made up message with words in context that it had previously struggled on. The message was&#160; “you itnerested in. off hlaf way. some ifno on. refused to leavr number. was looking to spk with accounts. her leter box. meeting he had witrh you last week. llease call regarding”</p>
<p>You can see in the screenshot below that the primary suggestion in word2 field was pretty rotten most of the time:</p>
<p>&#160;</p>
<p><a href="http://www.briandrought.com/blog/wp-content/uploads/2010/05/beforecontext.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="beforecontext" border="0" alt="beforecontext" src="http://www.briandrought.com/blog/wp-content/uploads/2010/05/beforecontext_thumb.png" width="660" height="311" /></a> </p>
<p>&#160;</p>
<p>What if, we had a massive database of text……? Lucky really, we do.</p>
<p>I wrote a routine to go back through our previous messages and split every sentence into three word groups, so the sentence “Wanted to follow up on the meeting he had with you last week” would give us:</p>
<table border="0" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="133"><strong>Word1</strong></td>
<td valign="top" width="133"><strong>Word2</strong></td>
<td valign="top" width="133"><strong>Word3</strong></td>
</tr>
<tr>
<td valign="top" width="133">wanted</td>
<td valign="top" width="133">to</td>
<td valign="top" width="133">follow</td>
</tr>
<tr>
<td valign="top" width="133">to</td>
<td valign="top" width="133">follow</td>
<td valign="top" width="133">up</td>
</tr>
<tr>
<td valign="top" width="133">follow</td>
<td valign="top" width="133">up</td>
<td valign="top" width="133">on</td>
</tr>
<tr>
<td valign="top" width="133">up</td>
<td valign="top" width="133">on</td>
<td valign="top" width="133">the</td>
</tr>
<tr>
<td valign="top" width="133">on</td>
<td valign="top" width="133">the</td>
<td valign="top" width="133">meeting</td>
</tr>
<tr>
<td valign="top" width="133">the</td>
<td valign="top" width="133">meeting</td>
<td valign="top" width="133">he</td>
</tr>
<tr>
<td valign="top" width="133">meeting</td>
<td valign="top" width="133">he</td>
<td valign="top" width="133">had</td>
</tr>
<tr>
<td valign="top" width="133">he</td>
<td valign="top" width="133">had</td>
<td valign="top" width="133">with</td>
</tr>
<tr>
<td valign="top" width="133">had</td>
<td valign="top" width="133">with</td>
<td valign="top" width="133">you</td>
</tr>
<tr>
<td valign="top" width="133">with</td>
<td valign="top" width="133">you</td>
<td valign="top" width="133">last</td>
</tr>
<tr>
<td valign="top" width="133">you</td>
<td valign="top" width="133">last</td>
<td valign="top" width="133">week</td>
</tr>
</tbody>
</table>
<p>&#160;</p>
<p>So there we have it… context. Whizzing through our database of past messages gave me around a million different three word phrases. Again, I used a ‘count’ so if it was a common phrase such as “please call back” I’d just increment the count if it was already in the database.</p>
<p>Then, I added another stage to the spell check, which was find words in context. If I came across an unknown word, I’d simply look in my table of the word phrases by using the surrounding words. For example, if I have ‘please ca regarding’ I’d simply search for any row where word1=please and word3=regarding. Here are some example results:</p>
<p>Please <strong>call</strong> regarding</p>
<p>Please <strong>email</strong> regarding</p>
<p>Please <strong>contact</strong> regarding</p>
<p>I then load all the returned middle words into my array, giving them a low weighting so they score highly</p>
<p>This context method gives the engine a much better idea of what the word could be than previously. Without context, the ‘please ca’ example the suggestion would likely be ‘please can’ which obviously makes no sense if the following word is ‘regarding’ but would make a lot of sense if word3 was ‘you’.</p>
<p>This screenshot shows how much better the results are with an idea of context:</p>
<p>&#160;</p>
<p><a href="http://www.briandrought.com/blog/wp-content/uploads/2010/05/aftercontext.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="aftercontext" border="0" alt="aftercontext" src="http://www.briandrought.com/blog/wp-content/uploads/2010/05/aftercontext_thumb.png" width="696" height="477" /></a> </p>
<p>&#160;</p>
<h4>Stage #11 – Always learning</h4>
<p>Goes without saying really, but the system continuously learns words and three word phrases from each new message</p>
<p>&#160;</p>
<h4>Stage #12 – Wrong words</h4>
<p>The danger with the system learning is that it could learn wrong words. I have a block process and once a week I check for any words that it’s learnt that are above or near the inclusion thresholds to appear in the results. With a single click I can either delete the word from the tables, or delete and block the word from ever being learnt by adding it to a BlackListedWords table.</p>
<p>&#160;</p>
<h3>Summary</h3>
<p>The process we’re now doing is:</p>
<ol>
<li>Check EnglishWords table to see if it’s a common word </li>
<li>Check LearntMistakes to see if we’ve seen the mistake before, if so, load in the corrections into an array of suggestions </li>
<li>Check ThreeWordPhrases using context to see what the word could be</li>
<li>Search LearntWords by Soundex and Double Metaphone to see any soundalike word we’ve seen before in a previous message for this client</li>
<li>Search LearntWords by Soundex and Double Metaphone to see any soundalike word we’ve seen before in any previous message (higher threshold)</li>
<li>Search BigDictionary by Soundex and Double Metaphone to see any soundalike words that are in the dictionary </li>
<li>Score all suggestions retrieved by Edit Distance, Position and a Source weighting</li>
</ol>
<p>&#160;</p>
<h3>Conclusion</h3>
<p>Has it made any difference? Yes!!</p>
<p>As I mentioned before, I re-analyse every message that’s sent after the review process. To make it fair, I re-analysed the past 3 months worth and did some stats. The number of spelling mistakes and typos was never really very high as we have a very <a href="http://www.jam.co.uk/staffrewards.aspx" target="_blank">strict QC policy</a> but in percentage terms, going on the two weeks the new system has been in place, the number of mistakes sent out to clients<font size="4"> </font><strong><font size="3">has dropped by 85%</font>. </strong>It also speeds up the operators as if they spot a mistake it used to take a while to correct if the suggestions were poor. </p>
<p>All in all, a very worth while exercise and a great learning project… I ended up learning linguistics, re-learning probability and reading some ‘challenging’ research papers!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=66</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Monitoring Electricity Usage</title>
		<link>http://www.briandrought.com/blog/?p=62</link>
		<comments>http://www.briandrought.com/blog/?p=62#comments</comments>
		<pubDate>Thu, 25 Feb 2010 14:29:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=62</guid>
		<description><![CDATA[<p>Simple one, this one&#8230;</p>
<p>We wanted to see exactly how much power we were using, and wanted to be able to display this information to staff.</p>
<p>First off, you need a monitoring device. I opted for the CurrentCost Envi with the optional data lead (and two more sensors as we&#8217;re on three phase!)</p>
<p>Next, you download the driver [...]]]></description>
			<content:encoded><![CDATA[<p>Simple one, this one&#8230;</p>
<p>We wanted to see exactly how much power we were using, and wanted to be able to display this information to staff.</p>
<p>First off, you need a monitoring device. I opted for the <a href="http://www.currentcost.com/product-cc128.html">CurrentCost Envi</a> with the optional data lead (and two more sensors as we&#8217;re on three phase!)</p>
<p>Next, you download the driver from the CurrentCost site. Then you plug the monitor into your USB port. In theory it&#8217;s now pumping data into COM3 at 56700 baud. Ace.</p>
<p>A quick check with HyperTerminal (you have to go hunting for this, it died with XP!) and sure as hell&#8230; we have some data coming in. The Envi pumps in the current readings at 6 second intervals. Cool.</p>
<p>Now, a teeny tiny bit of code in VB.Net gets the data into your app. With .NET 3.5 you get a nice SerialPort control. Drag one of those onto your form, and then add this code:</p>
<pre class="csharpcode"><span class="kwrd">Private</span> <span class="kwrd">Sub</span> Form1_Load(<span class="kwrd">ByVal</span> sender <span class="kwrd">As</span> System.<span class="kwrd">Object</span>, <span class="kwrd">ByVal</span> e <span class="kwrd">As</span> System.EventArgs) <span class="kwrd">Handles</span> <span class="kwrd">MyBase</span>.Load
SerialPort1.PortName = <span class="str">"COM3"</span>
SerialPort1.BaudRate = 57600
SerialPort1.Handshake = IO.Ports.Handshake.None
SerialPort1.Open()
<span class="kwrd">End</span> <span class="kwrd">Sub</span>

<span class="kwrd">Private</span> <span class="kwrd">Sub</span> SerialPort1_DataReceived(<span class="kwrd">ByVal</span> sender <span class="kwrd">As</span> <span class="kwrd">Object</span>, <span class="kwrd">ByVal</span> e <span class="kwrd">As</span> System.IO.Ports.SerialDataReceivedEventArgs) <span class="kwrd">Handles</span> SerialPort1.DataReceived

<span class="kwrd">Dim</span> datain <span class="kwrd">As</span> <span class="kwrd">String</span> = <span class="str">""</span>
datain = SerialPort1.ReadLine()
System.Diagnostics.Debug.Print(datain)

<span class="kwrd">End</span> <span class="kwrd">Sub</span></pre>
<p> </p>
<p>Tada!!!! You now have live electricity readings from within your app, coming in nice XML blobs like this:</p>
<pre class="csharpcode"><span class="kwrd">&lt;?</span><span class="html">xml</span> <span class="attr">version</span><span class="kwrd">="1.0"</span> <span class="attr">encoding</span><span class="kwrd">="utf-8"</span> ?<span class="kwrd">&gt;</span>

<span class="kwrd">&lt;</span><span class="html">msg</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">src</span><span class="kwrd">&gt;</span>CC128-v0.12<span class="kwrd">&lt;/</span><span class="html">src</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">dsb</span><span class="kwrd">&gt;</span>00001<span class="kwrd">&lt;/</span><span class="html">dsb</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">time</span><span class="kwrd">&gt;</span>12:38:19<span class="kwrd">&lt;/</span><span class="html">time</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">tmpr</span><span class="kwrd">&gt;</span>18.5<span class="kwrd">&lt;/</span><span class="html">tmpr</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">sensor</span><span class="kwrd">&gt;</span>0<span class="kwrd">&lt;/</span><span class="html">sensor</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">id</span><span class="kwrd">&gt;</span>00077<span class="kwrd">&lt;/</span><span class="html">id</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">type</span><span class="kwrd">&gt;</span>1<span class="kwrd">&lt;/</span><span class="html">type</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">ch1</span><span class="kwrd">&gt;</span>
    <span class="kwrd">&lt;</span><span class="html">watts</span><span class="kwrd">&gt;</span>02330<span class="kwrd">&lt;/</span><span class="html">watts</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;/</span><span class="html">ch1</span><span class="kwrd">&gt;</span>
<span class="kwrd">&lt;/</span><span class="html">msg</span><span class="kwrd">&gt;</span></pre>
<p><!-- .csharpcode, .csharpcode pre { 	font-size: small; 	color: black; 	font-family: consolas, "Courier New", courier, monospace; 	background-color: #ffffff; 	/*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt  { 	background-color: #f4f4f4; 	width: 100%; 	margin: 0em; } .csharpcode .lnum { color: #606060; } --> </p>
<p>A bit of XML jiggery pokery and you have a reasonably accurate data feed of your power readings in your SQL server.</p>
<p>To see what I did with the data, have a look at the <a title="JAM Blog" href=" http://blog.jam.co.uk/?p=30 " target="_blank">JAM Blog</a></p>
<p><!-- .csharpcode, .csharpcode pre { 	font-size: small; 	color: black; 	font-family: consolas, "Courier New", courier, monospace; 	background-color: #ffffff; 	/*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt  { 	background-color: #f4f4f4; 	width: 100%; 	margin: 0em; } .csharpcode .lnum { color: #606060; } --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=62</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Project Totem &#8211; A Long Polling server (Part 1)</title>
		<link>http://www.briandrought.com/blog/?p=56</link>
		<comments>http://www.briandrought.com/blog/?p=56#comments</comments>
		<pubDate>Wed, 25 Nov 2009 15:56:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=56</guid>
		<description><![CDATA[<p>&#160;</p>
Normal Polling
<p>Let’s start with normal polling. The browser simply runs some Javascript on a timer that repeatedly checks for new data on the server. The problem with this is there a trade off between latency and bandwidth. If you were to use a timer that ran every minute your server load would be minimal…. but [...]]]></description>
			<content:encoded><![CDATA[<p>&#160;</p>
<h2>Normal Polling</h2>
<p>Let’s start with normal polling. The browser simply runs some Javascript on a timer that repeatedly checks for new data on the server. The problem with this is there a trade off between latency and bandwidth. If you were to use a timer that ran every minute your server load would be minimal…. but there’d also be up to a minute before the user saw the changed data. You could drop it to a very short interval but you’d have a LOT of requests to your site. </p>
<p>&#160;</p>
<p> <img style="padding:5px;border-bottom: black 2px solid; border-left: black 2px solid; border-top: black 2px solid; border-right: black 2px solid" src="files/normalpollingprocess.png" />
<p>&#160;</p>
<p>The server setup is unchanged from a normal web server setup:</p>
<p>&#160;</p>
<p> <img style="padding:5px;border-bottom: black 2px solid; border-left: black 2px solid; border-top: black 2px solid; border-right: black 2px solid" src="files/normalpollingnetwork.png" />
<p>&#160;</p>
<p>&#160;</p>
<h2>Long Polling</h2>
<p>If we don’t want the bandwidth/latency trade off there is another way. You can use the timeout function of most AJAX libraries (I use jQuery) to perform ‘long polling’. Instead of asking the conversation between the browser and the server going like this:</p>
<blockquote><p>“Anything new…………………..? Anything new…………………..? Anything new…………………..? Anything new…………………..?Anything new…………………..? Anything new…………………..?Anything new…………………..? Anything new…………………..? Anything new…………………..?”</p>
</blockquote>
<p>It goes more like this:</p>
<blockquote><p>“Tell me if anything new comes along in the next 20 seconds ………………………………………… ………………………………………………………………….</p>
<p>Nothing? OK, let’s try again… </p>
<p>Tell me if anything new comes along in the next 20 seconds ………………………………………… ………………………………………………………………….&quot;</p>
</blockquote>
<p>a much more efficient use of bandwidth, but here’s the double bubble bonus. So long as the server it’s asking returns the new data and closes the connection there’s actually less latency. It doesn’t matter when the new data arrives, but with standard polling you have to wait until the next poll.</p>
<p>In flowchart form, long polling is super simple:</p>
<p> <img style="padding:5px;border-bottom: black 2px solid; border-left: black 2px solid; border-top: black 2px solid; border-right: black 2px solid" src="files/totempollingprocess.png" />
<p>So we&#8217;re all sorted right? Not <em>quite.</em></p>
<p>The problem with long polling using a regular web server is, it’s not very efficient. You end up with a LOT of open connections, and other than having IIS sit there spinning on each ‘poll’ page waiting for new data to come in, there’s not really a nice notification structure either. Apache is even worse on this front as it really dislikes connections being held open. Another minor snag is that you don’t want to query the original hostname for the data. Most browsers only allow you 2 connections per site, so if you tie up one on the polling there’s only one left to actually fetch data.</p>
<p>So, the answer is a dedicated polling server. </p>
<p>These things exist in the *nix world, most notably <a href="http://cometdproject.dojotoolkit.org/" target="_blank">CometD</a>, but it’s a lot to learn just to do something simple.</p>
<p>After 10 minutes of pontificating, I decided to do the obvious. Make my own! Project Totem is born. ( because a Totem is a ‘long pole’ and also as a nod to my friend Sam who runs <a href="http://www.totemdevelopment.co.uk/" target="_blank">Totem Development</a> )</p>
<p>In essence it’s a very simple Windows Sockets application that just pushes Javascript back to the browser. The browser then executes that script and gets the data from the original web server. </p>
<p> <img style="padding:5px;border-bottom: black 2px solid; border-left: black 2px solid; border-top: black 2px solid; border-right: black 2px solid" src="files/totempollingnetwork.png" />
<p>The server generates a GUID that’s sent to each page in the polling javascipt. The server also tells Totem that it’s served that GUID, and that page needs to know about changes to data sets A, B and C. </p>
<p>The browser then polls Totem using the GUID, and if there’s nothing new the request will just time out after 20 seconds. It then polls again, and repeats polling using a 20 second timeout. The very millisecond that Totem gets a notification from the webtier that say data set A has changed, it returns the appropriate Javascript back to the browser and shuts the connection. The browser then does whatever you want to go get the data etc.</p>
<p>I’ll explain more about how I’m tracking keys/scripts and GUIDs etc in part 2 <img src='http://www.briandrought.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=56</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Making the &#8216;New Message Alert&#8217; slicker with jQuery AJAX</title>
		<link>http://www.briandrought.com/blog/?p=54</link>
		<comments>http://www.briandrought.com/blog/?p=54#comments</comments>
		<pubDate>Fri, 13 Nov 2009 09:55:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=54</guid>
		<description><![CDATA[<p>&#160;</p>
<p>On our web platform, when you get sent a ‘U2U’ message, whether it’s from a colleague, or notification of a phone message, you get a little flashing envelope icon in the toolbar. Clicking on this takes you to your inbox. All of this is pretty much as you’d expect. The count of unread messages is [...]]]></description>
			<content:encoded><![CDATA[<p>&#160;</p>
<p>On our web platform, when you get sent a ‘U2U’ message, whether it’s from a colleague, or notification of a phone message, you get a little flashing envelope icon in the toolbar. Clicking on this takes you to your inbox. All of this is pretty much as you’d expect. The count of unread messages is done at page load (well.. taken from memcached anyway) and the button is generated then.</p>
<p>The snag with this is, if you’re expecting a message it can turn you into a bit of a refresh-monkey, reloading the page until you see the unread messages icon.</p>
<p>So we have a page that looks like this:</p>
<p> <img src="files/U2UPoll-Before.png" />
<p>and we need to make that U2U alert a bit more…………. ‘realtime’. Fortunately it’s pretty easy with a little bit of AJAX magic.</p>
<p>First of all we need to make the button have an identifiable &lt;div&gt; so all we do when rendering the button in the Page_Load is do this: &lt;span class=&quot;u2ubutton&quot;&gt;&lt;/span&gt;</p>
<p>Next, we simply create an ASPX page that returns the inner HTML for the button, which obviously depends on the number of messages. To do that we simply have an ASPX that goes a little like this:</p>
<pre class="csharpcode">response.clearcontent
response.cachecontrol = <span class="str">&quot;no-cache&quot;</span>
response.write MessageCount &amp; <span class="str">&quot;  U2Us&quot;</span> </pre>
<style type="text/css">
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>Obviously it’s a bit more complex than that as you only want to display the envelope icon if there’s an unread message, but I’ll spare you the boring part.</p>
<p>Once that’s done it’s a matter of polling that page. In time, I’ll convert it to “Long Polling” for higher efficiency and better response time, but for now, a simple javascript timer will suffice. To poll the button content page, and update the content we can use a single line of JQuery called from a timer:</p>
<pre class="csharpcode">        &lt;script type=<span class="str">&quot;text/javascript&quot;</span>&gt;

            $(document).ready(<span class="kwrd">function</span>() {
                 GetNewU2Us()
            });

            <span class="kwrd">function</span> GetNewU2Us() {
                $(<span class="str">&quot;.u2ubutton&quot;</span>).load(<span class="str">&quot;&lt;%=request.applicationpath%&gt;/returnu2ubuttoncontent.aspx&quot;</span>)
                window.setTimeout(<span class="kwrd">function</span>() {
                    GetNewU2Us()
                }, 10000);
            }

    &lt;/script&gt;</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>The initial page load calls GetNewU2Us which updates the div with the class ‘u2ubutton’ with the HTML that’s spat out by our button page. It then starts a timer off so the event will happen every 10 seconds.</p>
<p>So far so good then, our message alert now works nicely in the background and updates every 10 seconds.</p>
<p>If you’re not currently viewing that page though you might not see it. If you’ve used the Twitter web page recently you may have spotted a nice new addition. If more tweets come in, it updates the page title to ‘(3) Twitter / Home’ signifying there are 3 unread tweets. You can see that number in your browser tab, so even if you’re doing something else you can see at a glance you have new messages.</p>
<p>Replicating this with our solution is a doddle. In our&#160; returnu2ubuttoncontent.aspx page we just insert a bit of Javascript after the button text:</p>
<pre class="csharpcode">       <span class="kwrd">If</span> MessageCount &gt; 0 <span class="kwrd">Then</span>
            ButtonText += <span class="str">&quot;&lt;script language=&quot;</span><span class="str">&quot;JavaScript&quot;</span><span class="str">&quot;&gt;&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;var leftchar = document.title.substring(0, 1)&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;if (leftchar == '(')&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;{&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;var oldtitle = document.title&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;var rhb = oldtitle.indexOf(&quot;</span><span class="str">&quot;) &quot;</span><span class="str">&quot;)&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;oldtitle = oldtitle.substring(rhb + 1)&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;document.title = '(&quot;</span> &amp; MessageCount &amp; <span class="str">&quot;) ' + oldtitle&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;}&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;else&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;{&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;document.title = '(&quot;</span> &amp; MessageCount &amp; <span class="str">&quot;) ' + document.title&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;}&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;&lt;/script&gt;&quot;</span> &amp; vbCrLf
        <span class="kwrd">Else</span>
            ButtonText += <span class="str">&quot;&lt;script language=&quot;</span><span class="str">&quot;JavaScript&quot;</span><span class="str">&quot;&gt;&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;var leftchar = document.title.substring(0, 1)&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;if (leftchar == '(')&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;{&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;var oldtitle = document.title&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;var rhb = oldtitle.indexOf(&quot;</span><span class="str">&quot;) &quot;</span><span class="str">&quot;)&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;oldtitle = oldtitle.substring(rhb + 1)&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;document.title = oldtitle&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;}&quot;</span> &amp; vbCrLf
            ButtonText += <span class="str">&quot;&lt;/script&gt;&quot;</span> &amp; vbCrLf
        <span class="kwrd">End</span> If</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
</p>
<p>Apologies for not just pasting in the resultant javascript, but you can see what it does. If there’s an unread message, if inserts the top blob which strips a ‘(xx)’ from the title if there’s already one and then adds the new count. If the message count is 0 it just strips the (xx) if it exists. Not pretty but it works!</p>
<p>So after that bit of work, we’re left with:</p>
<p><img src="files/U2UPoll-After.png" /></p>
<p>Obviously the background polling and the nice magic updating aren’t apparent in a screenshot, but the end result is super slick…!!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=54</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reducing operator stress &#8211; Answer Phrase</title>
		<link>http://www.briandrought.com/blog/?p=52</link>
		<comments>http://www.briandrought.com/blog/?p=52#comments</comments>
		<pubDate>Fri, 06 Nov 2009 10:10:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=52</guid>
		<description><![CDATA[<p>&#160;</p>
<p>Our call agents have a pretty hard job. They answer around 300 calls per shift, and probably 50% of those calls are for unique clients. It’s not the same as say a call centre for Barclays where you answer the phone in the same way every time. Most of the time, they’ll have to answer [...]]]></description>
			<content:encoded><![CDATA[<p>&#160;</p>
<p>Our call agents have a pretty hard job. They answer around 300 calls per shift, and probably 50% of those calls are for unique clients. It’s not the same as say a call centre for Barclays where you answer the phone in the same way every time. Most of the time, they’ll have to answer the phone with a different greeting every time.</p>
<p>95% of our clients have a standard greeting of ‘Good [timeofday][company], how can I help you?’ so the operators have that down pretty well. The only snag is, since the client can change the greeting themselves, our operators are forced to read to whole answer phrase each time as they don’t know if it’s the standard greeting or not.</p>
<p>A few lines of Regex.Replace code and a bit of CSS styling, and we have a nice little mod. Instead of the whole greeting being in bold red text, if it’s the standard “Good morning [companyname], how can I help you?” then we <strong>lowlight</strong> (yes that’s a word!) the ‘standard’ parts.</p>
<p>Example:</p>
<p> <img src="files/revisedanswerphrase.png" />
<p>Look how much easier that is on the eye and how much quicker your brain can do the mental ‘replace’. </p>
<p>This should hopefully reduce the mental weight on the operator in that first half second of the call, allowing them to give a more natural greeting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=52</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting Extension Number from Local Machine</title>
		<link>http://www.briandrought.com/blog/?p=51</link>
		<comments>http://www.briandrought.com/blog/?p=51#comments</comments>
		<pubDate>Mon, 28 Sep 2009 15:19:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=51</guid>
		<description><![CDATA[<p>&#160;</p>
<p>For proper interfacing between computer and telephone, the computer system and the telephone system need to know which operator is on which extension. </p>
<p>On each machine we currently keep the extension in c:\extension.cfg. The &#8216;CTI&#8217; program that forces the webpages to &#8216;pop&#8217; uses this to know which extension it&#8217;s on and which extension to listen [...]]]></description>
			<content:encoded><![CDATA[<p>&#160;</p>
<p>For proper interfacing between computer and telephone, the computer system and the telephone system need to know which operator is on which extension. </p>
<p>On each machine we currently keep the extension in c:\extension.cfg. The &#8216;CTI&#8217; program that forces the webpages to &#8216;pop&#8217; uses this to know which extension it&#8217;s on and which extension to listen for on the main feed from the switchboard. It knows that if line 4000 gets answered on it&#8217;s workstation it needs to load <a href="http://ourserver/screenloader.aspx?line=4000">http://ourserver/screenloader.aspx?line=4000</a> </p>
<p>All nice and simple. However, with our new switch we need a bit more. </p>
<p>The telephone system needs to know which user is on which extension so that when the user clicks on &#8216;Dial 01234-567890&#8242; the switchboard will know which extension to make that call from. </p>
<p>The computer system needs to know for stats and when the telephone system says &quot;Ext 123 has picked up line 4000&quot; the computer knows to pop the right screen on the right extension. </p>
<p>So our web based system needs to know which user is sat at which physical machine. There are a couple of simple ways we could do this:</p>
<ol>
<li>Use the IP address to work out which extension is being used. </li>
<li>Make the user type it in after they login. </li>
<li>Make the CTI program the main login and have it pass the extension number over as part of the login data. </li>
<li>Something else. </li>
</ol>
<p>Hmmm. Option 1 is all fine until you have a roaming user. Option 2 would be a disaster. Option 3 would work, so long as they remembered to login with the CTI program and not via the normal webpage.</p>
<p>So option 4 it is… get the data from c:\extension.cfg into the database by using the normal webpage from IE/FF/Chrome. Right… should be easy….</p>
<p>Plain old Javascript was out as reading from the hard drive is a no no. ActiveX would do it, but IE only. There needed to be another way. </p>
<p>Fortunately there was. Our CTI program. That runs on every machine and knows the extension number. We just need to communicate with it. At first, I thought a quick and dirty bit of AJAX would suffice and I can make a call to <a href="http://localhost:1234/">http://localhost:1234/</a> where we’d have our CTI client listening on port 1234 waiting to respond. Sadly, browsers don’t like doing cross domain calls, and as far as it’s concerned, localhost is a different domain. Something more cunning was needed.</p>
<p><a href="http://jquery.com/" target="_blank">JQuery</a> is our saviour here. More exactly, <a href="http://docs.jquery.com/Ajax/jQuery.getJSON" target="_blank">jQuery.getJSON</a> is our saviour. We need to use JSONP which allows us to do cross site queries like you’d do into Flickr etc. All we need to do is to write a little server into the CTI app that returns some JSON data. Well, not quite. It’s JSONP. The difference is, the calling application, in this case JQuery passes a parameter that it expects as a call back.&#160; You ask it to get you <a href="http://server/getextension.scc?user=abc&amp;jsoncallback">http://server/getextension.scc?user=abc&amp;jsoncallback</a>=?&#160;&#160;&#160;&#160; and it replaces that 2nd ‘?’ with a random name. All you then need to do is use that as the method name for your JSON data. </p>
<p>&#160;</p>
<pre class="csharpcode">        <span class="kwrd">Dim</span> RequestParts <span class="kwrd">As</span> <span class="kwrd">String</span>() = Split(httprequest, <span class="str">&quot;?&quot;</span>)
        <span class="kwrd">Dim</span> pagename <span class="kwrd">As</span> <span class="kwrd">String</span> = RequestParts(0)
        <span class="kwrd">Dim</span> params <span class="kwrd">As</span> <span class="kwrd">New</span> SortedList

        <span class="kwrd">If</span> RequestParts.Count &gt; 1 <span class="kwrd">Then</span>
            <span class="kwrd">Dim</span> tmp_params() <span class="kwrd">As</span> <span class="kwrd">String</span> = Split(RequestParts(1), <span class="str">&quot;&amp;&quot;</span>)
            <span class="kwrd">For</span> <span class="kwrd">Each</span> tmp_param <span class="kwrd">In</span> tmp_params
                <span class="kwrd">If</span> tmp_param.Contains(<span class="str">&quot;=&quot;</span>) <span class="kwrd">Then</span>
                    <span class="kwrd">Dim</span> components() <span class="kwrd">As</span> <span class="kwrd">String</span> = Split(tmp_param, <span class="str">&quot;=&quot;</span>)
                    params.Add(components(0), components(1))
                <span class="kwrd">End</span> <span class="kwrd">If</span>
            <span class="kwrd">Next</span>
        <span class="kwrd">End</span> <span class="kwrd">If</span>

        <span class="kwrd">Select</span> pagename.ToLower
            <span class="kwrd">Case</span> <span class="str">&quot;maintainlogin.scc&quot;</span>
                ComposeResponse = params(<span class="str">&quot;jsoncallback&quot;</span>) &amp; <span class="str">&quot;({&quot;</span>
                ComposeResponse += <span class="str">&quot;&quot;</span><span class="str">&quot;extension&quot;</span><span class="str">&quot;:&quot;</span><span class="str">&quot;&quot;</span> &amp; extension &amp; <span class="str">&quot;&quot;</span><span class="str">&quot;,&quot;</span>
                ComposeResponse += <span class="str">&quot;&quot;</span><span class="str">&quot;switchboard&quot;</span><span class="str">&quot;:&quot;</span><span class="str">&quot;JAM&quot;</span><span class="str">&quot;&quot;</span>
                ComposeResponse += <span class="str">&quot;})&quot;</span>
        <span class="kwrd">End</span> Select</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>&#160;</p>
<p>So we call <a href="http://server:1234/getextension.scc?user=abc&amp;datacallback">http://server:1234/getextension.scc?user=abc&amp;datacallback</a>=? which JSON changes to (for example) <a href="http://server/getextension.scc?user=abc&amp;datacallback=JSONP123456789">http://server/getextension.scc?user=abc&amp;datacallback=JSONP123456789</a></p>
<p>Our ‘server’ app then responds with valid JSONP response of&#160; JSONP1234567890({“extension”:”2345”,”switchboard”:”JAM”})</p>
<p>We then need to send that data back to the main webserver so it can be written into the database. Fortunately, this isn’t cross domain and is a piece of cake with a JQuery ‘load’ event. ‘Load’ just gives you the HTML from a page you request and inserts it wherever you like into the DOM. I created a page called ‘extlogin.aspx’ that you pass the extension and the usercode to. It returns some HTML to notify the user of success.</p>
<p>All in all, on the main webpage that is sent to the client we simply need to include the JQuery library and do this in the body:</p>
<pre class="csharpcode">    &lt;div id=<span class="str">&quot;localcontent&quot;</span>&gt;
    &lt;/div&gt;

    &lt;script type=<span class="str">&quot;text/javascript&quot;</span>&gt;
  $(document).ready(<span class="kwrd">function</span>(){
    $.getJSON(<span class="str">&quot;http://127.0.0.1:1234/maintainlogin.scc?usercode=&lt;%=UserCode %&gt;&amp;jsoncallback=?&quot;</span>,
        <span class="kwrd">function</span>(data){
               $(<span class="str">&quot;#localcontent&quot;</span>).load(<span class="str">&quot;extlogin.aspx?extension=&quot;</span> + data.extension + <span class="str">&quot;&amp;usercode=&lt;%=UserCode %&gt;&quot;</span>);
          });
        });
    &lt;/script&gt;</pre>
<pre class="csharpcode">&#160;</pre>
<p>That calls our local server on port 1234, parses the result and pushes it back to the main webserver. Easy!</p>
<p>Here’s a screenshot of it in action:</p>
<p><img src="./files/blog-extensionnumber1.png" /> </p>
<p></p>
<p>and &#8216;proof&#8217; of it being written into the DB on the main server&#8230;</p>
<p><img src="./files/blog-extensionnumber2.png" /> </p>
<p></p>
<p>&#160;</p>
<p>So from local text file, to DB on webserver in a few easy steps with the help of JQuery, JSON and some imagination. Easy!! </p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=51</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Partitioning Data in SQL</title>
		<link>http://www.briandrought.com/blog/?p=46</link>
		<comments>http://www.briandrought.com/blog/?p=46#comments</comments>
		<pubDate>Fri, 17 Jul 2009 16:47:04 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.briandrought.com/blog/?p=46</guid>
		<description><![CDATA[<p>(Disclaimer: Yes, I know there are ways to ‘let SQL do this for you’ but they’re bloody complicated, and I have a really nice, even hash key to use so it’s simple!)</p>
<p>(Disclaimer 2: This is a pretty long post.)</p>
<p>So, after a few years of usage gathering data, SQL tables can become pretty large. Especially if [...]]]></description>
			<content:encoded><![CDATA[<p><em>(Disclaimer: Yes, I know there are ways to ‘let SQL do this for you’ but they’re bloody complicated, and I have a really nice, even hash key to use so it’s simple!)</em></p>
<p><em>(Disclaimer 2: This is a pretty long post.)</em></p>
<p>So, after a few years of usage gathering data, SQL tables can become pretty large. Especially if they’re wide tables. Consequently you also end up with a big database as well. What we want to do is to partition my data out into lots of smaller tables, across multiple databases.&#160; For example, a message is stored in a ‘messages’ table with the company code the message is for in the row. This makes an excellent partition as it’s very unlikely you’ll be wanting to do a query on messages for different companies. Similarly, you might partition user preferences by user.</p>
<p>You could simply partition everything by using the first letter of the name. Things for ‘Acme’ go into ‘MessagesA’ etc. However, the natural distribution of letters will mean some tables are vastly bigger than others.&#160;&#160; In my data I have hashcodes for everything which look like this: “F6QHC5TYB”. Think of them as mini GUID’s. Since these are pretty even, having an algorithm that decides where the data goes based on the hashcode should work fine.</p>
<p>Splitting across different tables is only half of the story though. You really want to be able to put those different tables into a few different databases, for example, with 4 tables split across 2 databases:</p>
<table border="1" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td valign="top" width="133">First Letter of Hash</td>
<td valign="top" width="133">Table</td>
<td valign="top" width="133">Database</td>
</tr>
<tr>
<td valign="top" width="133">&#160;</td>
<td valign="top" width="133">&#160;</td>
<td valign="top" width="133">&#160;</td>
</tr>
<tr>
<td valign="top" width="133">A</td>
<td valign="top" width="133">Messages1</td>
<td valign="top" width="133">Database1</td>
</tr>
<tr>
<td valign="top" width="133">B</td>
<td valign="top" width="133">Messages2</td>
<td valign="top" width="133">Database2</td>
</tr>
<tr>
<td valign="top" width="133">C</td>
<td valign="top" width="133">Messages3</td>
<td valign="top" width="133">Database1</td>
</tr>
<tr>
<td valign="top" width="133">D</td>
<td valign="top" width="133">Messages4</td>
<td valign="top" width="133">Database2</td>
</tr>
<tr>
<td valign="top" width="133">E</td>
<td valign="top" width="133">Messages1</td>
<td valign="top" width="133">Database1</td>
</tr>
<tr>
<td valign="top" width="133">F</td>
<td valign="top" width="133">Messages2</td>
<td valign="top" width="133">Database2</td>
</tr>
</tbody>
</table>
<p>So Messages1 &amp; Messages3 live in Database1, Messages 2 and 4 live in Database2. All we need to do is to have something that generates a suffix for the table and the database based on the hashcode. You could then simply do:</p>
<p><font size="4" face="Courier New"> &#8220;SELECT * From &#8221; &amp; GetDatabase(Hashcode) &amp; &#8220;..&#8221; &amp; GetTable(Hashcode) &amp; &#8221; Where blah blah blah&#8221; </font></p>
<p>Lets knock up a couple of functions then…..</p>
<p>&#160;</p>
<p><strong>For the Database</strong></p>
<pre class="csharpcode">    <span class="kwrd">Public</span> <span class="kwrd">Shared</span> <span class="kwrd">Function</span> PartitioningGetDatabase(<span class="kwrd">ByVal</span> HashCode <span class="kwrd">As</span> <span class="kwrd">String</span>, <span class="kwrd">ByVal</span> DataType <span class="kwrd">As</span> PartitioningDataType) <span class="kwrd">As</span> <span class="kwrd">String</span>
        <span class="kwrd">Dim</span> DatabaseName <span class="kwrd">As</span> <span class="kwrd">String</span> = <span class="str">&quot;&quot;</span>
        <span class="kwrd">Dim</span> DataBaseSuffix <span class="kwrd">As</span> <span class="kwrd">String</span> = <span class="str">&quot;&quot;</span>
        <span class="kwrd">Dim</span> SuffixRange <span class="kwrd">As</span> <span class="kwrd">Integer</span> = 2

        <span class="kwrd">Select</span> <span class="kwrd">Case</span> DataType</pre>
<pre class="csharpcode">            <span class="kwrd">Case</span> PartitioningDataType.Message
                DatabaseName = <span class="str">&quot;Database&quot;</span>
        <span class="kwrd">End</span> <span class="kwrd">Select</span>

        <span class="kwrd">Dim</span> _HashNumber <span class="kwrd">As</span> <span class="kwrd">Integer</span> = Asc(Left(HashCode, 1))
        <span class="kwrd">Dim</span> _HashOutNumber <span class="kwrd">As</span> <span class="kwrd">Integer</span> = 0

        <span class="kwrd">If</span> _HashNumber &gt; 47 <span class="kwrd">And</span> _HashNumber &lt; 58 <span class="kwrd">Then</span> _HashOutNumber = _HashNumber - 48
        <span class="kwrd">If</span> _HashNumber &gt; 64 <span class="kwrd">And</span> _HashNumber &lt; 91 <span class="kwrd">Then</span>  _HashOutNumber = _HashNumber - 55

        <span class="kwrd">Dim</span> _DataBaseNumber = _HashOutNumber <span class="kwrd">Mod</span> SuffixRange

        DataBaseSuffix = Chr(49 + _DataBaseNumber)

        <span class="kwrd">Return</span> DatabaseName &amp; DataBaseSuffix
    <span class="kwrd">End</span> Function</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
</p>
<p>&#160;</p>
<p><strong>For the Table</strong></p>
<pre class="csharpcode">    <span class="kwrd">Public</span> <span class="kwrd">Shared</span> <span class="kwrd">Function</span> PartitioningGetTable(<span class="kwrd">ByVal</span> HashCode <span class="kwrd">As</span> <span class="kwrd">String</span>, <span class="kwrd">ByVal</span> DataType <span class="kwrd">As</span> PartitioningDataType) <span class="kwrd">As</span> <span class="kwrd">String</span>
        <span class="kwrd">Dim</span> TableSuffix <span class="kwrd">As</span> <span class="kwrd">String</span> = <span class="str">&quot;&quot;</span>
        <span class="kwrd">Dim</span> SuffixRange <span class="kwrd">As</span> <span class="kwrd">Integer</span> = 8

        <span class="kwrd">Select</span> <span class="kwrd">Case</span> DataType
            <span class="rem">'Here was can set different ranges based on different data types</span>
            <span class="kwrd">Case</span> PartitioningDataType.Message
                SuffixRange = 4
        <span class="kwrd">End</span> <span class="kwrd">Select</span>

        <span class="kwrd">Dim</span> _HashNumber <span class="kwrd">As</span> <span class="kwrd">Integer</span> = Asc(Left(HashCode, 1))
        <span class="kwrd">Dim</span> _HashOutNumber <span class="kwrd">As</span> <span class="kwrd">Integer</span> = 0

        <span class="kwrd">If</span> _HashNumber &gt; 47 <span class="kwrd">And</span> _HashNumber &lt; 58 <span class="kwrd">Then</span> _HashOutNumber = _HashNumber - 48
        <span class="kwrd">If</span> _HashNumber &gt; 64 <span class="kwrd">And</span> _HashNumber &lt; 91 <span class="kwrd">Then</span> _HashOutNumber = _HashNumber - 55

        <span class="kwrd">Dim</span> _TableNumber = _HashOutNumber <span class="kwrd">Mod</span> SuffixRange

        <span class="kwrd">If</span> _TableNumber &lt; 9 <span class="kwrd">Then</span>
            TableSuffix = Chr(49 + _TableNumber)
        <span class="kwrd">Else</span>
            TableSuffix = Chr(56 + _TableNumber)
        <span class="kwrd">End</span> <span class="kwrd">If</span>

        <span class="kwrd">Return</span> TableSuffix
    <span class="kwrd">End</span> Function</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>&#160;</p>
<p>The ‘suffixrange’ is what determines how big the spread is for the tables and the databases and is variable dependant on the type of datas. This way you can easily have it set to spread very heavily used data across more tables/db’s than less heavy data.</p>
<p>So let’s have a look at some data output using a simple test rig like this:</p>
<pre class="csharpcode">        <span class="kwrd">Dim</span> Chars <span class="kwrd">As</span> <span class="kwrd">String</span> = <span class="str">&quot;ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789&quot;</span>

        <span class="kwrd">For</span> i <span class="kwrd">As</span> <span class="kwrd">Integer</span> = 1 <span class="kwrd">To</span> Len(Chars)</pre>
<pre class="csharpcode">            <span class="kwrd">Dim</span> ThisChar <span class="kwrd">As</span> <span class="kwrd">String</span> = Mid(Chars, i, 1)
            <span class="kwrd">Dim</span> ThisTable <span class="kwrd">As</span> <span class="kwrd">String</span> = PartitioningGetTable(ThisChar, PartitioningDataType.Message)
            <span class="kwrd">Dim</span> ThisDatabase <span class="kwrd">As</span> <span class="kwrd">String</span> = PartitioningGetDatabase(ThisChar, PartitioningDataType.Message)
            System.Diagnostics.Debug.Print(ThisChar &amp; <span class="str">&quot;,&quot;</span> &amp; ThisDatabase &amp; <span class="str">&quot;,Messages&quot;</span> &amp; ThisTable)
        <span class="kwrd">Next</span> i</pre>
<pre class="csharpcode">&#160;</pre>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<table style="border-bottom: #666 1px solid; border-left: #666 1px solid; border-top: #666 1px solid; border-right: #666 1px solid" cellpadding="5">
<tbody>
<tr>
<th>1st Char of Hash</th>
<th>Database</th>
<th>Table</th>
</tr>
<tr>
<td>A</td>
<td>DataShard2</td>
<td>Messages3</td>
</tr>
<tr>
<td>B</td>
<td>DataShard1</td>
<td>Messages4</td>
</tr>
<tr>
<td>C</td>
<td>DataShard2</td>
<td>Messages5</td>
</tr>
<tr>
<td>D</td>
<td>DataShard1</td>
<td>Messages6</td>
</tr>
<tr>
<td>E</td>
<td>DataShard2</td>
<td>Messages7</td>
</tr>
<tr>
<td>F</td>
<td>DataShard1</td>
<td>Messages8</td>
</tr>
<tr>
<td>G</td>
<td>DataShard2</td>
<td>Messages1</td>
</tr>
<tr>
<td>H</td>
<td>DataShard1</td>
<td>Messages2</td>
</tr>
<tr>
<td>I</td>
<td>DataShard2</td>
<td>Messages3</td>
</tr>
<tr>
<td>J</td>
<td>DataShard1</td>
<td>Messages4</td>
</tr>
<tr>
<td>K</td>
<td>DataShard2</td>
<td>Messages5</td>
</tr>
<tr>
<td>L</td>
<td>DataShard1</td>
<td>Messages6</td>
</tr>
<tr>
<td>M</td>
<td>DataShard2</td>
<td>Messages7</td>
</tr>
<tr>
<td>N</td>
<td>DataShard1</td>
<td>Messages8</td>
</tr>
<tr>
<td>O</td>
<td>DataShard2</td>
<td>Messages1</td>
</tr>
<tr>
<td>P</td>
<td>DataShard1</td>
<td>Messages2</td>
</tr>
<tr>
<td>Q</td>
<td>DataShard2</td>
<td>Messages3</td>
</tr>
<tr>
<td>R</td>
<td>DataShard1</td>
<td>Messages4</td>
</tr>
<tr>
<td>S</td>
<td>DataShard2</td>
<td>Messages5</td>
</tr>
<tr>
<td>T</td>
<td>DataShard1</td>
<td>Messages6</td>
</tr>
<tr>
<td>U</td>
<td>DataShard2</td>
<td>Messages7</td>
</tr>
<tr>
<td>V</td>
<td>DataShard1</td>
<td>Messages8</td>
</tr>
<tr>
<td>W</td>
<td>DataShard2</td>
<td>Messages1</td>
</tr>
<tr>
<td>X</td>
<td>DataShard1</td>
<td>Messages2</td>
</tr>
<tr>
<td>Y</td>
<td>DataShard2</td>
<td>Messages3</td>
</tr>
<tr>
<td>Z</td>
<td>DataShard1</td>
<td>Messages4</td>
</tr>
<tr>
<td>0</td>
<td>DataShard2</td>
<td>Messages1</td>
</tr>
<tr>
<td>1</td>
<td>DataShard1</td>
<td>Messages2</td>
</tr>
<tr>
<td>2</td>
<td>DataShard2</td>
<td>Messages3</td>
</tr>
<tr>
<td>3</td>
<td>DataShard1</td>
<td>Messages4</td>
</tr>
<tr>
<td>4</td>
<td>DataShard2</td>
<td>Messages5</td>
</tr>
<tr>
<td>5</td>
<td>DataShard1</td>
<td>Messages6</td>
</tr>
<tr>
<td>6</td>
<td>DataShard2</td>
<td>Messages7</td>
</tr>
<tr>
<td>7</td>
<td>DataShard1</td>
<td>Messages8</td>
</tr>
<tr>
<td>8</td>
<td>DataShard2</td>
<td>Messages1</td>
</tr>
<tr>
<td>9</td>
<td>DataShard1</td>
<td>Messages2</td>
</tr>
</tbody>
</table>
<p>&#160;</p>
<p>All looking good ? OK, let’s change the suffixrange to spread it across 3 databases instead of 2…</p>
<p>&#160;</p>
<table border="1" cellspacing="0" cellpadding="2" width="800">
<tbody>
<tr>
<td valign="top" width="400">
<h3>8 tables, 2 DBs</h3>
</td>
<td valign="top" width="400">
<h3>8 tables, 3 DBs</h3>
</td>
</tr>
<tr>
<td valign="top" width="400">
<table cellpadding="5">
<tbody>
<tr>
<th>1st Char of Hash</th>
<th>Database</th>
<th>Table</th>
</tr>
<tr>
<td>A</td>
<td>DataShard1</td>
<td>Messages3</td>
</tr>
<tr>
<td>B</td>
<td>DataShard2</td>
<td>Messages4</td>
</tr>
<tr>
<td>C</td>
<td>DataShard1</td>
<td>Messages5</td>
</tr>
<tr>
<td>D</td>
<td>DataShard2</td>
<td>Messages6</td>
</tr>
<tr>
<td>E</td>
<td>DataShard1</td>
<td>Messages7</td>
</tr>
<tr>
<td>F</td>
<td>DataShard2</td>
<td>Messages8</td>
</tr>
<tr>
<td>G</td>
<td>DataShard1</td>
<td>Messages1</td>
</tr>
<tr>
<td>H</td>
<td>DataShard2</td>
<td>Messages2</td>
</tr>
<tr>
<td>I</td>
<td>DataShard1</td>
<td>Messages3</td>
</tr>
<tr>
<td>J</td>
<td>DataShard2</td>
<td>Messages4</td>
</tr>
<tr>
<td>K</td>
<td>DataShard1</td>
<td>Messages5</td>
</tr>
<tr>
<td>L</td>
<td>DataShard2</td>
<td>Messages6</td>
</tr>
<tr>
<td>M</td>
<td>DataShard1</td>
<td>Messages7</td>
</tr>
<tr>
<td>N</td>
<td>DataShard2</td>
<td>Messages8</td>
</tr>
<tr>
<td>O</td>
<td>DataShard1</td>
<td>Messages1</td>
</tr>
<tr>
<td>P</td>
<td>DataShard2</td>
<td>Messages2</td>
</tr>
<tr>
<td>Q</td>
<td>DataShard1</td>
<td>Messages3</td>
</tr>
<tr>
<td>R</td>
<td>DataShard2</td>
<td>Messages4</td>
</tr>
<tr>
<td>S</td>
<td>DataShard1</td>
<td>Messages5</td>
</tr>
<tr>
<td>T</td>
<td>DataShard2</td>
<td>Messages6</td>
</tr>
<tr>
<td>U</td>
<td>DataShard1</td>
<td>Messages7</td>
</tr>
<tr>
<td>V</td>
<td>DataShard2</td>
<td>Messages8</td>
</tr>
<tr>
<td>W</td>
<td>DataShard1</td>
<td>Messages1</td>
</tr>
<tr>
<td>X</td>
<td>DataShard2</td>
<td>Messages2</td>
</tr>
<tr>
<td>Y</td>
<td>DataShard1</td>
<td>Messages3</td>
</tr>
<tr>
<td>Z</td>
<td>DataShard2</td>
<td>Messages4</td>
</tr>
<tr>
<td>0</td>
<td>DataShard1</td>
<td>Messages1</td>
</tr>
<tr>
<td>1</td>
<td>DataShard2</td>
<td>Messages2</td>
</tr>
<tr>
<td>2</td>
<td>DataShard1</td>
<td>Messages3</td>
</tr>
<tr>
<td>3</td>
<td>DataShard2</td>
<td>Messages4</td>
</tr>
<tr>
<td>4</td>
<td>DataShard1</td>
<td>Messages5</td>
</tr>
<tr>
<td>5</td>
<td>DataShard2</td>
<td>Messages6</td>
</tr>
<tr>
<td>6</td>
<td>DataShard1</td>
<td>Messages7</td>
</tr>
<tr>
<td>7</td>
<td>DataShard2</td>
<td>Messages8</td>
</tr>
<tr>
<td>8</td>
<td>DataShard1</td>
<td>Messages1</td>
</tr>
<tr>
<td>9</td>
<td>DataShard2</td>
<td>Messages2</td>
</tr>
</tbody>
</table>
</td>
<td valign="top" width="400">
<table cellpadding="5">
<tbody>
<tr>
<th>1st Char </th>
<th>Database</th>
<th>Table</th>
</tr>
<tr>
<td>A</td>
<td><strong>DataShard2</strong></td>
<td><strong>Messages3</strong></td>
</tr>
<tr>
<td>B</td>
<td>DataShard3</td>
<td>Messages4</td>
</tr>
<tr>
<td>C</td>
<td>DataShard1</td>
<td>Messages5</td>
</tr>
<tr>
<td>D</td>
<td>DataShard2</td>
<td>Messages6</td>
</tr>
<tr>
<td>E</td>
<td>DataShard3</td>
<td>Messages7</td>
</tr>
<tr>
<td>F</td>
<td>DataShard1</td>
<td>Messages8</td>
</tr>
<tr>
<td>G</td>
<td>DataShard2</td>
<td>Messages1</td>
</tr>
<tr>
<td>H</td>
<td>DataShard3</td>
<td>Messages2</td>
</tr>
<tr>
<td>I</td>
<td><strong>DataShard1</strong></td>
<td><strong>Messages3</strong></td>
</tr>
<tr>
<td>J</td>
<td>DataShard2</td>
<td>Messages4</td>
</tr>
<tr>
<td>K</td>
<td>DataShard3</td>
<td>Messages5</td>
</tr>
<tr>
<td>L</td>
<td>DataShard1</td>
<td>Messages6</td>
</tr>
<tr>
<td>M</td>
<td>DataShard2</td>
<td>Messages7</td>
</tr>
<tr>
<td>N</td>
<td>DataShard3</td>
<td>Messages8</td>
</tr>
<tr>
<td>O</td>
<td>DataShard1</td>
<td>Messages1</td>
</tr>
<tr>
<td>P</td>
<td>DataShard2</td>
<td>Messages2</td>
</tr>
<tr>
<td>Q</td>
<td><strong>DataShard3</strong></td>
<td><strong>Messages3</strong></td>
</tr>
<tr>
<td>R</td>
<td>DataShard1</td>
<td>Messages4</td>
</tr>
<tr>
<td>S</td>
<td>DataShard2</td>
<td>Messages5</td>
</tr>
<tr>
<td>T</td>
<td>DataShard3</td>
<td>Messages6</td>
</tr>
<tr>
<td>U</td>
<td>DataShard1</td>
<td>Messages7</td>
</tr>
<tr>
<td>V</td>
<td>DataShard2</td>
<td>Messages8</td>
</tr>
<tr>
<td>W</td>
<td>DataShard3</td>
<td>Messages1</td>
</tr>
<tr>
<td>X</td>
<td>DataShard1</td>
<td>Messages2</td>
</tr>
<tr>
<td>Y</td>
<td><strong>DataShard2</strong></td>
<td><strong>Messages3</strong></td>
</tr>
<tr>
<td>Z</td>
<td>DataShard3</td>
<td>Messages4</td>
</tr>
<tr>
<td>0</td>
<td>DataShard1</td>
<td>Messages1</td>
</tr>
<tr>
<td>1</td>
<td>DataShard2</td>
<td>Messages2</td>
</tr>
<tr>
<td>2</td>
<td><strong>DataShard3</strong></td>
<td><strong>Messages3</strong></td>
</tr>
<tr>
<td>3</td>
<td>DataShard1</td>
<td>Messages4</td>
</tr>
<tr>
<td>4</td>
<td>DataShard2</td>
<td>Messages5</td>
</tr>
<tr>
<td>5</td>
<td>DataShard3</td>
<td>Messages6</td>
</tr>
<tr>
<td>6</td>
<td>DataShard1</td>
<td>Messages7</td>
</tr>
<tr>
<td>7</td>
<td>DataShard2</td>
<td>Messages8</td>
</tr>
<tr>
<td>8</td>
<td>DataShard3</td>
<td>Messages1</td>
</tr>
<tr>
<td>9</td>
<td>DataShard1</td>
<td>Messages2</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>Whoops!!! What happened there? The table Messages3 is now in 3 different databases. That’s not good. Whilst it’d be ‘workable’ it’s messy. If you ever want to move the data around again it’s going to be very hard work. So what went wrong? Well… look at our code that calls the functions again…..</p>
<pre class="csharpcode"><span class="kwrd"></span></pre>
<blockquote>
<pre class="csharpcode"><span class="kwrd">Dim</span> ThisTable <span class="kwrd">As</span> <span class="kwrd">String</span> = PartitioningGetTable(ThisChar, PartitioningDataType.Message)
<span class="kwrd">Dim</span> ThisDatabase <span class="kwrd">As</span> <span class="kwrd">String</span> = PartitioningGetDatabase(ThisChar, PartitioningDataType.Message)</pre>
</blockquote>
<p>&#160;</p>
<p>Can you see the problem&#160; ?<br />
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
</p>
<p>The table and the database are determined on the hash of the row we’re storing. Sounds correct yes? If you ran 8 tables on 4 DB’s it’d look fine against the original data, but since we’re using a non divisible numbers, the suffixes go out of sync. You could just assume that you’d be sensible and move from say 2 DB’s to 4, or we make a teeny tiny code change……..</p>
<pre class="csharpcode"><span class="kwrd"></span></pre>
<blockquote>
<pre class="csharpcode"><span class="kwrd">Dim</span> ThisTable <span class="kwrd">As</span> <span class="kwrd">String</span> = PartitioningGetTable(ThisChar, PartitioningDataType.Message)
<span class="kwrd">Dim</span> ThisDatabase <span class="kwrd">As</span> <span class="kwrd">String</span> = PartitioningGetDatabase(This<strong><u>Table</u></strong>, PartitioningDataType.Message)</pre>
</blockquote>
<style type="text/css">
<p>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }</style>
<p>&#160;</p>
<p>Not a very complicated change is it, but it should be a fundamental…. The hashcode must just determine which table to use, and<strong> the table suffix should always determine the DB. </strong>That way you always always have the same tables in the same DB’s. </p>
<p>Hope that helps someone out before they make a massively silly mistake!</p>
<p>I’ll post up real life distribution data based on my keys at a later date.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.briandrought.com/blog/?feed=rss2&amp;p=46</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
