Monthly Archives: December 2016

The HumbleLab: Storage Spaces with Tiers – Making Pigs Fly!

I have mixed feelings about homelabs. It seems ludicrous to me that in a field that changes as fast as IT that employers do not invest in training. You would think on-the-clock time dedicated to learning would be an investment that would pay itself back in spades. I also think there is something psychologically dangerous in working your 8-10 hour day and then going home and spending your evenings and weekends studying/playing in your homelab. Unplugging and leaving computers behind is pretty important, in fact I find the more and more I do IT the less interest I have in technology in general. Something, something, make an interest a career and then learn to hate it. Oh well.

That being said, IT is a fast changing field and if you are not keeping up one way or another, you are falling behind. A homelab is one way to do this, plus sometimes it is kind of nice to just do stuff without attending governance meetings or submitting to the tyranny of your organization’s change control board.

Being the cheapskate that I am, I didn’t want to go out spend thousands of my own dollars on hardware like all the cool cats in r/homelab so I just grabbed some random crap lying around work, partly just to see how much use I could squeeze out of it.

Dell OptiPlex 990 (circa 2012)

  • Intel i7-2600, 3.4GHz 4 Cores, 8 Threads, 256KB L2, 8MB L3
  • 16GBs, Non-EEC, 1333MHz DDR3
  • Samsung SSD PM830, 128GBs SATA 3.0 Gb/s
  • Samsung SSD 840 EVO 250GBs SATA 6.0 Gb/s
  • Seagate Barracuda 1TB SATA 3.0 Gb/s

The OptiPlex shipped with just the 128GB SSD which only had enough storage capacity to host the smallest of Windows virtual machines so I scrounged up the two other disks from other desktops that were slated for recycling. I am particularly proud of the Seagate because if the datecode on the drive is to be believed it was originally manufactured sometime in late 2009.

A bit of a pig huh? Let’s see if we can make this little porker fly.

A picture of the inside of HumbleLab

Oh yeah… look at that quality hardware and cable management. Gonna be hosting prod workloads on this baby.

I started out with a pretty simple/lazy install of Windows Server 2012 R2 and the Hyper-V role. At this point in time I only had the original 128GB SSD that operating system was installed on and the ancient Seagate being utilized for .VHD/.VHDX storage.

Performance was predictably abysmal, especially once I got a SQL VM setup and “running”:

IOmeter output

At this point, I added in the other 256GB SSD, destroyed the volume I was using for .VHD/.VHDX storage and recreated it using Storage Spaces. I don’t have much to say about Storage Spaces here since I have such a simple/stupid setup. I just created a single Storage Pool using the 256GB SSD and 1TB SATA drive. Obviously with only two disks I was limited to a Simple Storage Layout (no disk redundancy/YOLO mode). I did opt to create a larger 8GB Write Cache using PowerShell but other than that I pretty much just clicked through the wizard in Server Manager:

 

Let’s see how we did:

IOMeter Results with Storage Tiers

A marked improvement! We tripled our IOPS from a snail-like 234 to a tortoise-like 820 and managed to reduce the response time from 14ms to 5ms. The latency reduction is probably the most important. We generally shoot for under 2ms for our production workloads but considering the hardware 5-6ms isn’t bad at all.

 

What if I just run .VHDX file directly on the shared 128GB SSD that the Hyper-V host is utilizing without any Storage Tiers involved at all?

Hmm… not surprisingly the results are even better but what was surprising is by how much.  We are looking at sub 2ms latency and about four and half times more IOPS than what my Storage Spaces Virtual Disk can deliver.

Of course benchmarks, especially quick and dirty ones like this, are very rarely the whole story and likely do not even come close to simulating your true workload but at least it gives us a basic picture of what my aging hardware can do: SATA = Glacial, Storage Tiers with SSD Caching=OK, SSD=Good. It also illustrates just how damn fast SSDs are. If you have a poorly performing application, moving it over to SSD storage is likely going to be the single easiest thing you can do to improve its performance. Sure, the existing bottleneck in the codebase or database design is still there, but does that matter anymore if everything is moving 4x faster? Like they say, Hardware is Cheap, Developers are Expensive.

I put this together prior to the general release of Server 2016 so it would be interesting to see if running this same setup on 2016’s implementation of Storage Spaces with ReFS instead of NTFS would yield better results. It also would be interesting to refactor the SQL database and at the very least place the TempDB, SysDBs and Log files directly onto to host’s 128GB SSD. A project for another time I guess…

Until next time… may your pigs fly!

A flying pig powered by a rocket

Additional reading / extra credit:

Don’t Build Private Clouds? Then What Do We Build?

Give Subbu Allamaraju’s blog post Don’t Build Private Clouds a read if you have not yet. I think it is rather compelling but also wrong in a sense. In summation: 1) Your workload is not as special as you think it is, 2) your private cloud isn’t really a “cloud” since it lacks the defining scale, resiliency, automation framework, PaaS/SaaS and self-service on-demand functionality that a true cloud offering like AWS, Azure or Google has and 3) your organization is probably doing a poor job of building a private cloud anyway.

Now lets look at my team – we maintain a small Cisco FlexPod environment – about 14 ESXi hosts, 1.5TBs RAM and about 250TBs of storage. We support about 600 users and I am primary for the following:

  • Datacenter Virtualization: Cisco UCS, Nexus 5Ks, vSphere, NetApp and CheckPoint firewalls
  • Server Infrastructure: Platform support for 150 VMs, running mostly either IIS or SQL
  • SCCM Administration (although one of our juniors has taken over the day to day tasks)
  • Active Directory Maintenance and Configuration Management through GPOs
  • Team lead responsibilities under the discretion of my manager for larger projects with multiple groups and stakeholders
  • Escalation point for the team, point-of-contact for developer teams
  • Automation and monitoring of infrastructure and services

My-day-to-day consists of work supporting these focus areas – assisting team members with a particularly thorny issue, migrating in-house applications onto new VMs, working with our developer teams to address application issues, existing platform maintenance, holding meetings talking about all this work with my team, attending meetings talking about all this work with my managers, sending emails about all this work to the business stakeholders and a surprising amount of tier-1 support (see here and here).

If we waved our magic wand and moved everything into the cloud tomorrow, particularly into PaaS where the real value to cost sweet spot seems to be, what would I have left to do? What would I have left to build and maintain?

Nothing. I would have nothing left to build.

Almost all of my job is working on back-end infrastructure, doing platform support or acting as an human API/”automation framework”. As Subbu states, I am a part of the cycle of “brittle, time-consuming, human-operator driven, ticket based on-premises infrastructure [that] brews a culture of mistrust, centralization, dependency and control“.

I take a ticket saying, “Hey, we need a new VM.” and I run some PowerShell scripts to create and provision above said new VM in a semi-automated fashion, I then copy the contents of the older VM’s IIS directory over. I then notice that our developers are passing credentials in plaintext back and forth through web forms and .XML files between different web services which kicks off a whole week’s worth of work to re-do all their sites in HTTPS. I then setup a meeting to talk about these changes with my team (cross training) and if we are lucky  someone upstream actually gets to my ticket and these changes go live. This takes about three to four weeks optimistically.

In the new world our intrepid developer tweaks his Visual Studio deployment settings and his application gets pushed to an Azure WebApp which comes baked in with geographical redundancy, automatic scale-out/scale-up, load-balancing, a dizzying array of backup and recovery options, integration with SaaS authentication providers, PCI/OSI/SOC compliance and the list goes on. This takes all of five minutes.

However here is where I think Subbu get its wrong: Of our 150 VMs, about 50% of them belong to those “stateful monoliths”. They are primarily composed of line-of-business applications with proprietary code bases that we don’t have access to or they are legacy applications built on things like PowerBuilder and no one understands how they work anymore. They are spread out across 10 to 20 VMs to provide segmentation but have huge monolithic database designs. It would cost us millions of dollars to re-factor the application into a design that could truly take advantage of cloud services in their PaaS form. Our other option would be cloud-based IaaS which is not that different from the developer’s perspective than what we are currently doing except that it costs more.

I am not even going to touch on our largest piece of IT spend which is a line-of-business application that has “large monolithic databases running on handcrafted hardware.” in the form of an IBM z/OS mainframe. Now our refactoring cost is in the ten of millions of dollars.

 

If this magical cloud world comes to pass what do I build? What do I do?

  • Like some kind of carrion lord, I rule over my decaying infrastructure and accumulated technical debt until everything legacy has been deprecated and I am no longer needed.
  • I go full retar… err… endpoint management. I don’t see desktops going away anytime soon despite all this talk of tablets, mobile devices and BYOD.
  • On-prem LAN networking will probably stick around but unfortunately this is all contracted out in my organization.
  • I could become a developer.
  • I could become a manager.
  • I could find another field of work.

 

Will this magical cloud world come to pass?

Maybe in the real world but I have a hard time imaging how it work for us. We are so far behind in terms of technology and so organizationally dysfunctional that I cannot see how moving 60% of our services from on-prem IaaS to cloud-based IaaS would make sense, even if leadership could lay off all of the infrastructure support people like myself.

Our workloads aren’t special. They’re just stupid and it would cost a lot of money to make them less stupid.

 

The real pearl of wisdom…

The state of [your] infrastructure influences your organizational culture.Of all things in that post, I think this is the most perceptive as it is in direct opposition to everything our leadership has been saying about IT consolidation. The message we have continually been hearing for the last year and a half is that IT Operations is a commodity service – the technology doesn’t matter, the institutional knowledge doesn’t matter, the choice of vendor doesn’t matter, the talent doesn’t matter: It is all essentially the same and it is just a numbers game to find the implementation that is the most affordable.

As a nerd-at-heart I have always disagreed with this position because I believe your technology choices determine what is possible (i.e., if you need a plane but you get a boat that isn’t going to work out for you) but the insight here that I have never really deeply considered is that your choice of technology drastically effects how you do things. It effects your organization’s cultural orientation to IT. If you are a Linux shop, does that technology choice precede your dedication to continuous integration, platform-as-code and remote collaboration? If you are a Windows shop, does that technology choice precede your stuffy corporate culture of ITIL misery and on-premise commuter hell? How much does our technological means of accomplishing our economic goals shape our culture? How much indeed?

 

Until next time, keep your stick on the ice.

SCCM SUP Failing to Download Updates – Invalid Certificate Error

I am currently re-building my System Center lab which includes re-installing and re-configuring a basic SCCM instance. I was in the process of getting my Software Update Point (SUP) setup and a few of my SUP groups failed to download their respective updates and deploy correctly. I dimly remember working through this issue in our production environment a few years ago when we moved from SCCM 2012 to 2012 R2 and I cursed myself for not taking better notes and so here we are atoning for our past sins!

Here’s the offending error:

SCCM SUP Download Error Dialog - Invalid Cert

 

SCCM is a chatty little guy and manages to generate some 160 or so different log files, spread out across a number of different locations (see the highly recommended MSDN blog article, A List of SCCM Log Files). Identifying and locating the relevant logs to whatever issue you are troubleshooting is about half the battle with SCCM. Unfortunately I couldn’t seem to find patchdownloader.log in its expected location of SMS_CCM\Logs. Turns out if you are running the Configuration Manager Console from a RDP session patchdownloader.log will get stored in C:\Users\%USERNAME%\AppData\Local\Temp instead of SMS\Logs. Huh. In my case, I am RDPing to the SCCM Server and running the console but I wonder if I run it from a client workstation whether the resulting log will end up locally on that workstation in my %TEMP% folder or whether it will end up on the SCCM SUP server in SMS_CCM\Logs… an experiment for another day I guess.

 

Here’s the juicy bits:

 

A couple of interesting things to note from here:

  • We get the actual error code (0x80073633) which you can use CMTrace’s Error Lookup functionality to “resolve” back to the human readable one the Console presents you with. Sometimes this turns out to be useful information.
  • We get the download location for the update
  • We get the distribution package location that the update is being downloaded to

If I manually browse to the wsus.ds.download.windowsupdate.com URL I manage to download the update without issues. No certificate validation issues which one would expect considering that the connection is going over HTTP according to the log. Makes one wonder how the resulting error was related to an “invalid certificate”…

OK. How do I fix it? Well like most things SCCM the solution is as stupid as it is brilliant. Manually download the update from the Microsoft Update Catalog. Go find the offending update in its respective Software Update Group by referencing the KB number and download it again but this time set your Download Location to the directory that already contains it.

Whoops. Didn’t work.

Take a look at the first attempt to download the content… SCCM Is looking for Windows10.0-KB3172989-x64.cab so it can be downloaded into my %TEMP% directory and then eventually moved off the Deployment Package’s source location at \\SCCM\Source Files\Windows\Updates\2016.

The file I downloaded is not named Windows10.0-KB3172989-x64.cab – it’s actually an .msu file. Use 7-Zip or similar tool to pull the .cab file out of it and now it SCCM SUP should successfully “download” the the update and ship it off to source location for your Deployment Package.