Less is More

Sometimes it really is true.  Let me tell you a story about a really powerful server.  We’ll call him Bob.  Bob is impressive.  RAM? Yeah, he’s got that, to the tune of a terabyte.  He has no problems with premature ejection of data from his cache.  Brains?  He’s like a rocket scientist on steroids, with 8 physical processors each sporting 10 cores.  Truly a dizzying intellect.  All those brains aren’t very useful without some data to back them up, so he snuggled up to a SAN sporting more spindles than you can shake a stick at.  Yes, you might say Bob was pretty happy with himself.  He was more than ready to take on the workload from our demanding SQL Server (let’s call her Sheila), thinking he wouldn’t break a sweat.

He was right.  Heck, he had the performance counters to prove it: He never went above 50% CPU.  Sheila, on the other hand, wasn’t happy.  She wanted more CPU, but try as he might, Bob just couldn’t seem to get her to take his massive CPU offering.  You see, Sheila was a perfect 10.0  — her @@Version, that is.  Windows took the 80 logical cores Bob had to offer and split them up into 2 processor K-Groups, each with 40 cores.  Since Sheila was so old school, she just wasn’t ready to be involved in a multiple K-Group relationship.  Something had to be done.  Bob’s peers in his “cluster” of friends suggested coaxing Sheila into 10.5 territory, where she’d be more open to doubling up on K-Groups, but Bob knew that trying to bring Sheila more current would likely mean she’d be leaving him for different hardware, and he just couldn’t stand the thought of losing her.

So Bob racked his BIOS looking for an answer to his problems.  It was there that he found the answer.  He had to be gentler with Sheila.  Rather than showing her everything he had to offer, why not tone it down a little?  In his BIOS he activated a feature to disable processor cores, telling each processor to behave as if it only had 8 cores instead of 10.  With only 64 cores visible to Windows, it saw no need to split them into K-Groups.  Ironically, by showing her less, Sheila was able to see more of what Bob had to offer.  Bob and Sheila would have more time together after all.  He might be sweating a little bit more now, but the sweaty part of the relationship is often the most fun.

Sometimes, Fewer is Better
Bob’s first big task with Sheila, given their new and empowered relationship status, was to see how much of an improvement all of this available brain power would have on Sheila’s CHECKTABLE performance.  So she queued up her largest database and got to work.  Something strange happened, though.  When presented with even more processing power, Sheila had problems bringing it all to bear and delivering the kind of performance she knew she was capable of with Bob.  Firing up her perfmon toolbox, she could see she was getting a dismal rate of throughput from all the shiny spindles Bob was so friendly with.  But it wasn’t the spindles that were the problem.  Latency was low – less than 5ms on average– so the spindles were delivering whenever they were asked.  They were delivering on target, too, with the right block size and alignment.  What could be the problem?
Maybe she just needed to try more things at once?  Instead of doing a single CHECKTABLE, she fired up multiple sessions of DBCC CHECKTABLE to try to spice things up.  It worked –disk throughput improved– but it was a bit unwieldy.  Bob had a script to refresh the SAN volumes presented to Sheila, attach them, and kick off a series of CHECKTABLE commands for each database.  When that script was done, a report was e-mailed to the DBA team detailing the results.  It would take a lot of work to revamp that process to spin up multiple sessions, aggregate the data accurately, and handle the timing of the volume refresh.  What’s a girl to do?  Checking this database was taking her 4 DAYS, and –love your data or not– that is clearly too much time with a static data set.  After 4 days, the SAN spindles ached to refresh their volumes, and Bob was wondering what was going on – Sheila was just not demanding CPU like he expected.  In a desperate attempt to save their relationship, Sheila started down the road of parallel operations, even going so far as to do some fancy partition switching to break up her largest tables into smaller chunks to process in parallel.

It was a novel idea, but she wasn’t sure it was going to guarantee consistency.  After all, she was changing the schema ever so slightly, which meant the data she was analyzing wasn’t really the same as the data in production.  To sanity check, her DBA got in touch with Paul Randal (blog | twitter), who gave some free advice.  One of those pieces of advice was to read this blog post detailing some experiments he did with CHECKDB and MAXDOP.  It turns out that trying to bring all 64 cores to bear during a CHECKTABLE is not that effective.  So Bob and Sheila conducted some experiments together, and slowly but surely they were able to find her sweet spot.  It turned out that Sheila had a pretty small sweet spot, but she was happiest at MAXDOP 4, where she saw between a fifteen and twenty-fold improvement in throughput.  Instead of 4 days, she was able to confirm her consistency in less than 2, enough of an improvement that her crazy partition-switching plans were shelved.

Bob and Sheila learned an important lesson about their relationship this week.  Quality really does top quantity, and sometimes you have to stop trying to go full bore if you want to have a better experience.

Got an opinion on this post?  Share it below!  If you thought “Wow, this is the kind of work I’d like to be doing!” and are interested in working for a company that wins awards for employee engagement, retention, and enrichment, drop me a line.  We’re looking for a DBA to add to our team.  We also have openings for .NET programmers. I love my job and the people I work with.  I’ll bet you would, too.