Tag Archives: windows server

Managing the Windows Time Service with SCCM’s Configuration Items

Keeping accurate and consistent time is important in our line of business. Event and forensic correlation, authentication protocols like Kerberos that rely on timestamps and just the simple coordination of things like updates all require accurate timekeeping. Computers are unfortunately notoriously bad at keeping time so we have protocols like NTP and time synchronization hierarchies to keep all the clocks ticking. In an Active Directory environment this is one of those things that (if setup correctly on the Domain Controller holding the PDC Emulator FSMO role) just kind of takes care of itself but what if you have WORKGROUP machines? Well. You’re in luck. You can use SCCM’s Configuration Items to manage configuration settings in devices that are outside of your normal domain environment in isolated networks and beyond the reach of tools like GPOs.

There’s really two pieces to this. We need to ensure that the correct NTP servers are being used so all our domain-joined and WORKGROUP machines get their time from the same source and we need to ensure that our NTP client is running.

 

Setting the correct NTP Servers for the Windows Time Service

To get started create a Configuration Item with a Registry Value based setting. The NTPServer value sets which NTP servers the Windows Time Service (W32Time) pulls from. We can manage it like so:

  • Setting Type: Registry Value
  • Data Type: String
  • Hive Name: HKEY_LOCAL_MACHINE
  • Key Name: SYSTEM\CurrentControlSet\Services\W32Time\Parameters
  • Value Name: NtpServer

The corresponding Compliance Rule is straight forward. We just want to ensure that the same time servers we are using in our domain environment are set here as well.

  • Rule type: Value
  • Setting must comply with the following value: youtimeserver1,yourtimeserver2
  • Remediate non-compliant rules when supported: Yes
  • Report noncompliance if this setting is not found: Yes

  • Rule Type: Existential
  • Registry value must exist on client devices: Yes

 

The Setting should require the existence of the NTPServer key and set its value as specified. If it is set to something else the value will be remediated back to your desired value. You can learn more about setting the NTPServer registry key values and controlling the polling interview at this Microsoft MSDN blog post.

 

Ensuring the Windows Time Service is Running

If the time service isn’t running then you are not going to have accurate time keeping! This is further complicated by the behavior of the Window Times Service on WORKGROUP computers. The time service will stop immediately after system startup, even if the Startup Type is set to Automatic. The W32Time service is configured as a Trigger-Start service in order to reduce the number of services running in Windows 7 and Server 2008 R2 and above. The trigger (of course) that causes it to automatically start is whether or not the machine is domain-joined so for WORKGROUP machines the service status is set to Stopped. Not very helpful in our scenario. Let’s change that.

We can start by just performing a simple WQL query to see if the W32Time service is running:

  • Setting type: WQL Query
  • Data type: String
  • Namespace: root\cimv2
  • Class: Win32_Service
  • Property: Name
  • WQL query WHERE clause: Name like “%W32Time%” and State like “%Running%”

It’s a bit backward but if the query comes back with no results then the configuration state we are looking for does “not exist” and so we’ll mark it as non-compliant. It’s not intuitive but it works:

  • Rule Type: Existential
  • Registry value must exist on client devices: Yes

 

This gives us the status of the Windows Time Service but we still need to remove the [DOMAIN JOINED] trigger so the service will actually start automatically. PowerShell to the rescue!

  • Setting Type: Script
  • Data Type: Interger
  • Script: PowerShell

Discovery Script

Remediation Script

 

  • Value returned by the specified script: Equals 0
  • Run the specified remediation script when this setting is noncompliant: Yes

The Discovery script will return various non-compliant values depending on the configuration state of the endpoint. This will then cause the Remediation script to run which sets the service’s Startup Type to Automatic, removes the [DOMAIN JOINED] trigger and starts the service.

I hope this posts helps you manage your time configuration on all those weird one-off WORKGROUP machines that we all seem to have floating around out there.

Until next time, stay frosty.

Using Azure Cool Blob Storage with Veeam 9.5u3

“Hey, AKSysAdmin. I want to push all our backups to cheap Azure storage. Can you do a proof-of-concept for me and a quick cost write up?”

We are all eagerly awaiting the implementation of Veeam’s Scale-Out Backup Repository Archive Tier functionality in v10. The Archive Tier functionality will allow Veeam customers to leverage to “cheap” cloud storage like AWS’s S3 and Glacier and Azure’s rather hilariously named Cool Blob Storage. In the meantime if you wanted to use Azure Blob Storage right now what are your options?

  • A “middleware” appliance like NetApp’s AltaVault, Microsoft’s StorSimple or a VTL
  • Roll your own IaaS solution in Azure

The first option is pretty straight-forward. You buy an appliance that provides a storage target for your on-prem Veeam Backup and Replication server and send your Backup Copy jobs to that Backup Repository. Once your backups are located there, “magic” happens that handles the hot/warm/cold tier-ing of the data out to Azure as well as the conversion from structured data to unstructured data.

The second option is a little more complicated. You’ll need to spin up an Azure IaaS VM, attach blob storage to it and make it usable to your on-prem Veeam infrastructure.

 

Before we go too much further we should probably talk about the different blob storage types.

Block Blobs

These are pretty much what they sound like, block based storage of large contiguous files. They work great for things are not accessed via randomized read and writes. The individual blocks stored in each blob are referenced by a BlockID and can be uploaded/modified/downloaded simultaneously, assembled and then committed with a single operation . You can see how well this type of storage lends itself to streaming services where large files are split into smaller pieces and uploaded or downloaded sequentially.  The maximum size (as of writing) for a block blob is about 4.75TBs.

Page Blobs

Page blobs are composed of 512-byte pages optimized for random read and write operations. Changes to the pages require immediate commits unlike block blobs. Page blobs work great for things like virtual disks where some other mechanism is organizing the data inside the blob. Page blobs are used for the underlying storage for Azure IaaS data disks. The maximum size (as of writing) for a page blob is 8TBs.

Azure Blob Storage Tiers: Hot, Cool and Archive

Azure Storage Accounts allow you group all your various pieces of blob storage together for the purposes of management and billing. With Blob and General Purpose v2 Storage Accounts you can elect to use storage tiers. Cool Blob Storage has lower storage costs (and higher access costs) and is intended for things like short-term backup and disaster recovery data. Archive storage has even lower storage costs (and even higher access costs) and is designed for data that can tolerate hours of potential retrieval time. Archive storage is intended for long-term backups, secondary backup storage or data that has archival requirements. In order to read the data in an archive storage the blob needs to be rehyrdated which can take up to 15 hours. Blob size is also a factor in rehyrdation time.

I should mention that the option to have your blobs stored in locally redundant storage (LRS) or globally redundant storage (GRS) exists for all of these flavors.

 

This is all great but how do I use it?

Well if you went with the first option you break out your wallet for a capital purchase and follow Veeam’s Deployment Guide for AltaVault or vendor equivalent.

The second option is a little more involved. You need to deploy an instance of Veeam’s Cloud Connect for the Enterprise, add some data disks to the resulting Azure Iaas VM, configure them in Windows, setup a Backup Repository using them and finally add the resulting repository to your on-prem install as a Cloud Backup Repository. For the price of the IaaS VM and the underlying storage you now have a cloud-based backup repository using Azure blob storage.

Here’s why you probably don’t want to do this.

Veeam will support Azure Cool Blob storage fairly soon so you have to ask yourself if it makes sense to buy a purpose built “middleware” appliance to bridge the gap. A few years ago it would of been a no-brainer but with more and more backup vendors supporting cloud storage natively it seems like market for these devices will shrink.

The second option has some issues as well. Your freshly created Cloud Backup Repository is backed by Azure IaaS data disks which sit on top of page blob storage. Guess what page blobs don’t support? Storage tiers. If you create a storage account in the cool tier you’ll notice the only container option you have is for block blobs. If you try and add a data disk to your IaaS VM using a blob storage account you get this error:

Not going to work.

What if you setup a Azure File Storage container and utilized it instead of a data disk? Same problem. Only block blob storage supports archiving tiers at this point in time.

What if you just provisioned extra data disks for your VM, and use Storage Spaces and ReFS to get your storage? Well that will sort of work but there many limitations:

  • Data disks are limited to 4TBs
  • Most IaaS VMs only support 15 data disks
  • If you need more than 15 data disks your IaaS VM is going to get really expensive
  • You have to correctly manage and configure a VM with 15 disks using Storage Spaces
  • All your disks are running on page blob storage which is not really that cheap

The “roll-your-own-IaaS” solution will be performance and capacity limited right out of the gate. It will be complicated and potentially brittle and it doesn’t take advantage of the pricing of storage tiers making it rather pointless in my opinion.

Why you still may want to do this

If the backup dataset that you want to archive is fairly small this might still make sense but if that’s the case I would forgo the entire exercise of trying to cram a round peg into a square hole and look very seriously at a DRaaS provider like Iland where you will get so much more than just cloud storage for your backups for what will likely be a competitive price.

Why even if you still want to do this it’s probably not a good idea

Everything is elastic in the cloud except the bill and unless you have an accurate picture of what you really need you might be surprised once you get that bill. There is a bunch of things that are not really accounted for in your traditional on-premise billing structure: IP addresses, data transfer between virtual networks, IOPS limited performance tiers and so on. In short, there is a lot more to doing the cost analysis than just comparing the cost of storage.

Speaking of – let’s take a look at the current storage prices and see if they really are “cheap”. These prices are based on the Azure Storage Overview pricing and are located in the WestUS2 region of Azure ComCloud.

Standard Page Blobs (Unmanaged Disks)

LRS ZRS GRS RA-GRS
$0.045 per GB N/A per GB $0.06 per GB $0.075 per GB

This also comes with a $0.0005 per 10,000 transactions charge when Standard Page Blobs are attached to a VM as an Unmanaged Disk.

 

Block Blob Pricing

Hot Cool Archive
First 50 terabyte (TB) / month $0.0184 per GB $0.01 per GB $0.002 per GB
Next 450 TB / Month $0.0177 per GB $0.01 per GB $0.002 per GB
Over 500 TB / Month $0.017 per GB $0.01 per GB $0.002 per GB

There are also some operational charges and data transfer costs

Write Operations* (per 10,000) $0.05 $0.10 $0.10
List and Create Container Operations (per 10,000) $0.05 $0.05 $0.05
Read Operations** (per 10,000) $0.004 $0.01 $5
All other Operations (per 10,000), except Delete, which is free $0.004 $0.004 $0.004
Data Retrieval (per GB) Free $0.01 $0.02
Data Write (per GB) Free $0.0025 Free

 

To replace our rather small GFS tape set we’d need somewhere north of 100TBs. The first problem is with the limitation requiring us to use page blob backed data disks, we won’t even be able to meet our capacity requirements (4TBs per data disk, 15 data disks per IaaS VM = 60 TBs).

If we put aside the capacity issue, let’s look at a notional cost just for comparison’s sake: 100TBs * 1024 = 102,400 GBs * $0.045 = $4,608 per month. This doesn’t include the cost of the IaaS VM and associated infrastructure you may need (IP addresses, Virtual Networks, Site-to-Site VPN, etc.) nor any of the associated transaction charges.

The storage charge is more than expected since we’re not really using the technology as intended. Block blob storage in the archive tier gets us a much more respectable number: 100TBs * 1024 = 102,400 GBs * $0.002 = $204.8 per month. BUT we need to factor in the cost of some kind of “middleware” appliance to utilize this storage so tack on an extra $40-$60k (it’s hard to pin this cost down since it will come via a VAR so I could be totally off). If we “op-ex” that cost over three years it’s an additional $1388.00 a month bringing your total to $1593.68 per month for “cheap” storage.

OK. Looks like our “cheap” cloud storage may not be as cheap as we thought. Let’s take a look at our on-premise options.

LTO data tapes… personally I loath them but they have their place. Particularly for archiving GFS data sets that are small. A 24 slot LTO-6 tape library like Dell’s TL2000 is around $20k and 40 LTO-6 tapes with a raw capacity of 100TBs (not including compression) comes to about $602 per month over three years.

What about on-premise storage? A Dell MD1400 with 12 10TB 7.2K RPM NLSAS drives is somewhere in the $15-$20k range and brings 80TBs of usable storage in RAID-60 configuration. Allocated out over three years this comes to roughly $555 per month.

Summary

Technology choices are rarely simple and no matter how much executive and sales folks push “cloud-first” like it’s some kind of magic bullet, cloud services are a technology like any other with distinct pros and cons, use cases and pitfalls. Getting an accurate picture of how much it will cost to shift a previously capital expense based on-premise service to cloud services is actually a fairly difficult task. There are a tremendous amount of things that you get “included” in your on-premise capital purchases that you have to pay for every month once that service is in the cloud and unless you have a good grasp on them you will get a much bigger bill than you expected. I really recommend SysAdmin1138’s post about the challenges of moving an organization to this new cost model if you are considering any significant cloud infrastructure.

If you want to use Azure Blob Storage right now for Veeam the answer is: You can but it’s not going to work the way you want, it’s probably going to cost more than you think and you’re not really using the technology the way it was intended to be used which is asking for trouble. You could buy some middleware appliance but with Scale-Out Backup Repository Archive Tier functionality on the immediate horizon this sounds like a substantial infrastructure investment that you’re only going to get limited return of business value on. It might make sense to wait.

Finally a little bit of disclaimer. I tried to pull the pricing numbers from old quotes that I have (hence the LTO-6 and LTO-8 tapes) to try and keep the math grounded in something like reality. Your prices may vary wildly and I highly encourage you to compare all the different cost options and spend some time to try to capture all of the potential costs of cloud services that may be hidden (i.e., it’s not just paying for the storage). Cloud services and their pricing are constantly changing too so it’s worth checking with Microsoft to get these numbers from the source.

Until next time, stay frosty.

Scheduling Backups with Veeam Free and PowerShell

Veeam Free Edition is an amazing product. For the low price of absolutely zero you get a whole laundry list of enterprise-grade features: VeeamZip (Full Backups), granular and application-aware restore of items, native tape library support and direct access to NFS-based VM storage using Veeam’s NFS client. One thing that Veeam Free doesn’t include however is a scheduling mechanism. We can fix that with a little bit of PowerShell that we run as a scheduled task.

I have two scripts. The first one loads the Veeam PowerShell Snap-In, connects to the Veeam server, gets a list of virtual machines and then backs them up to a specified destination.

 

I had Veeam setup on a virtual machine running on the now defunct HumbleLab. One of the disadvantages of this configuration is I don’t have separate storage to move the resulting backup files onto. You could solve this by simply using an external hard drive but I wanted something a little more… cloud-y. I setup Azure Files so I could connect to cheap, redundant and most importantly off-site, off-line storage via SMB3 to store a copy of my lab backups. The biggest downside to this is security. Azure Files is really not designed to be a full featured replacement for a traditional Windows file server. It’s really more of SMB-as-a-Service offering designed to be programmatically accessed by Azure VMs. SMB3 provides transit encryption but you would still probably be better off using a Site-to-Site VPN between your on-prem Veeam server and a Windows file server running as VM in Azure or by using Veeam’s Cloud Connect functionality. There’s also no functionality replacing or replicating NTFS permissions. The entire “security” of your Azure Files SMB share rests in the storage key. This is OK for a lab but probably not OK for production.

Here’s the script that fires off once a week and copies the backups out to Azure Files. For something like my lab it’s a perfect solution.

 

Until next time, stay frosty!

Five things to not screw up with SCCM

With great power comes great responsibility

Uncle Ben seemed like a pretty wise dude when when he dropped this particular knowledge bomb on Peter Parker. As sysadmins we should already be aware of the tremendous amount of power that has been placed into our hands. Using tools like SCCM further serve to underline this point and while I think SCCM is an amazing product and has the ability to be a fantastic force multiplier you can also reduce your business’ infrastructure to ashes within hours if you use it wrong. I can think of two such events where an SCCM Administrator has mistakenly done some tremendous damage: In 2014 a Windows 7 deployment re-imaged most of the computers, including their servers at Emory University and another unfortunate event where a contractor managed to accomplish the same thing at the Commonwealth Bank of Australia back in the early 2000s.

There are a few things you can do to enjoy the incredible automation, configuration and standardization benefits of SCCM while reducing your likelihood of an R.G.E.

Dynamic Collection Queries

SCCM is all about performing an action on large groups of computers. Therefore it is absolutely imperative that your Collections ACTUALLY CONTAIN THE THINGS YOU THINK THEY DO. Your Collections need to start large and gradually get smaller using a sort of matryoshka doll scheme based on dynamic queries and limiting Collections. You should double/triple/quadruple check your dynamic queries to make sure they are doing what you think they are doing when you create them. It is wise to review these queries on a regular basis to make sure an underlying change in something like Active Directory OU structure or naming convention hasn’t caused your query to match 2000 objects instead of your intended 200. Finally, I highly recommend spot-checking Collection members of your targeted Collection before deploying anything particular hairy and/or when deploying to a large Collection because no matter how diligent we are, we all make mistakes.

Maintenance Windows

“The bond traders are down! The bond traders are down! Cry and hue! Panic! The CIO is on his way to your boss’s office!” Not what you want to hear at 7:00 AM as you are just starting on your first cup of coffee, huh? You can prevent this by making sure your Maintenance Windows are setup correctly. SCCM will do what you tell it to do and if you tell it to allow the agent to reboot at 11:00AM instead of 11:00PM, that’s what’s going to happen.

I like setting up an entirely separate Collection hierarchy that is used solely for setting Maintenance Windows and include my other Collections as members. This prevents issues where the same Collection is used for both targeting and scheduling. It also reduces Maintenance Window sprawl where machines are members of multiple Collections all with different Maintenance Windows. It’s important to consider that Maintenance Windows are “union-ed” so if you have a client in Collection A with a Maintenance Window of 20:00 – 22:00 and in Collection B with a Maintenance Window of 12:00 – 21:00 that client can reboot anywhere between 12:00 – 22:00. There’s nothing more annoying than a workstation that was left in a forgotten testing Collection with a Maintenance Window spanning the whole business day – especially after the technician was done testing and that workstation was delivered to some Department Director.

I am also a huge fan of the idea of a “Default Maintenance Window” where you have a Maintenance Window that is in the past and non-reoccurring that all SCCM clients are a member of. This means that no matter what happens with a computer’s Collection membership it isn’t just going to randomly reboot if it has updates queued up and its current Maintenance Window policy is inadvertently removed.

Last but not least, and this goes for really anything that is scheduled in SCCM, pay attention to date and time. Watch for AM versus PM, 24-hour time vs. 12-hour time,  new day rollover (i.e., 08/20 11:59PM to 08/21 12:00PM) and UTC versus local time.

Required Task Sequences

Of all the things in SCCM this is probably one of the most dangerous. Task Sequences generally involve re-partitioning, re-formatting and re-imaging a computer which has the nice little side effect of removing everything previously on it. You’ll notice that both of those incidents I mentioned at the start of this post were caused by Task Sequences that inadvertently ran on a much larger group of computers than was intended. As a general guideline, I council staff to avoid deploying Task Sequences as Required outside of the Unknown Computers Collection. The potential to nuke your line of business application servers and replace them with Windows 10 is reduced if you have done your fundamentals right in setting up your Collections but I still recommend deploying to small Collections, making your Deployment Available instead of Required (especially if you are testing), restricting who can deploy Task Sequences and password protecting the Task Sequence. I would much rather reboot severs to clear the WinPE environment than recover them from backups.

Automatic Deployment Rules

Anything in SCCM that does stuff automatically deserves some scrutiny. Automatic Deployment Rules are another version of Dynamic Collection Queries. You want to use them and they make your life easier but you need to be sure that they do what you think that they do, especially before they blast out this month’s patches to the All Clients collection instead of the Patch Testing collection. Deployment templates can make it harder to screw up your SUP deployments and once again pay attention to the advertisement and deadline time watching for mistakes with UTC vs. local time or +1 day rollover, the Maintenance Window behavior and which collection you are deploying to. And please, please, please test your SUP groups first before deploying them widely. You too can learn from our mistakes.

Source Files Management and Organization

A messy boat is a dangerous boat. There is a tendency for the source files directory that you are using to store all your installers for Application and Package builds to just descends into chaos over time. This makes it increasingly difficult to figure out what installers are still being used and what stuff was part of some long forgotten test. What’s important here is that you have a standard for file organization and you enforce it with an iron fist.

I like to break things out like this:

A picture depicting the Source Files folder structure

Organizing your source files… It’s a Good Thing.

It’s a pretty straight forward scheme but you get the idea: Applications – Vendor – Software Title – Version and Bitness – Installer. You may need to add more granularity to your Software Updates Deployment Package folders depending on your available bandwidth and how many updates you are deploying in a single SUP group. We have had good results with grouping them by year but then again we are not an agency with offices all over rural Alaska.

 

Mitigation Techniques

There are a few techniques you can use to prevent yourself from doing something terrible.

Roll-based Access Control

You can think of Security Scopes as the largest possible number of clients a single admin can break. If you have a big enough team, the clever use of RBAC will allow you limit how much damage individual team members can do. For example: You could divide your 12 person SCCM team into three sub-teams and use RBAC to limit each sub-team to only being able to manage 1/3 of your clients. You could take this idea a step further and give your tier-1 help desk the ability to do basic “non-dangerous” actions but still allow them the ability to use SCCM to perform their job. This is pretty context specific but there is a lot you can do with RBAC to limit the potential scope of an Administrator’s actions.

Application Requirements (Global Conditions)

You can use Application Requirements as a basic mechanism to prevent bad things from happening if they are deployed to the wrong Collection inadvertently.

Look at all these nice, clean servers… it would be a shame if someone accidentally deployed the Java JRE to all of them, wouldn’t it? Well, if you put in a Requirement that checks the value of ProductType in the Win32_OperatingSystem WMI class to ensure the client has a workstation operating system then the Application will fail its Requirements check and won’t be installed on those servers.

 

There’s so much in WMI that you could build some WQL queries that prevent “dangerous” applications from meeting a Requirement of clients outside its intended deployment.

 

PowerShell Killswitch

SCCM is a pull-based architecture. An implication of this is once the clients have a bad policy they are going to act on it. The first thing you should do if you discover a policy is stomping on your clients is to try and limit the damage by preventing unaffected clients from pulling it. A simple PowerShell script that stops the IIS App Pools backing your Management Points and Distribution Points will act as a crude but effective kill switch. By having this script prepped and ready to go you can immediately stop the spread of something bad and then focus your efforts on correcting the mistake.

Sane Client Settings

There is a tendency to crank up some of the client-side polling frequencies in smaller SCCM implementations in order to make things “go faster” however another way to look at the polling interval is that this is the the period of time it takes for all of your clients to have received a bad policy and possibly acted on it. If your client policy polling interval is 15 minutes that means in 15 minutes you will have re-imaged all your clients if you really screwed up and deployed a Required Task Sequence to All Systems. The longer the polling frequency, the more time you have to identify a bad policy, stop it and begin rebuilding before it has nuked your whole fleet.

Team Processes

A few simple soft processes can go a long way. If you are deploying out an Application or Updates to your whole fleet, send out a notification to your business leaders. People are generally more forgiving of mistakes when they are notified of significant changes first. Perform a gradual roll-out over a week or two instead of blasting out your Office 365 installation application to all 500 workstations at once. Setting sane scheduling and installation deadlines in your Deployments helps here too.

If you are doing something that could be potentially dangerous, grab a coworker and do pilot/co-pilot for the deployment. You (the pilot) perform the work but you walk your coworker (the co-pilot) through each step and have them verify it. Putting a second pair of eyes on a deployment avoids things like inadvertently clicking the “Allow clients to restart outside of Maintenance Windows” checkbox. Next time you need to do this deployment switch roles – Bam! Instant cross training!

Don’t be in a hurry. Nine times out of ten, the dangerous thing is simple to deploy but the simple settings cannot be wrong. Take your time to do things right and push back when you are given unrealistic schedules or asked to deploy things outside of your roll-out process. In the mountains we like to say, slow is fast and fast is dead. In SCCM I like to say, slow is fast, and fast is fired.

Read-Only Friday is the holiest of days on the Sysadmin calendar. Keep it with reverence and respect.

Consider enabling the High Risk Deployment Setting. If you do this make sure you tune the settings so your admins don’t get alert fatigue and just learn to click next, next, finish or eventually they will click next, next, finish and go “oops”.

 

I hope this is helpful. If you have other ideas on how not blow up everything with SCCM feel free to comment. I’m always up for learning something new!

Until next time, stay frosty.

 

 

 

The HumbleLab: Windows Server 2016, ReFS and “no sufficient eligible resources” Storage Tier Errors

Well, that didn’t last too long did it? Three months after getting my Windows Server 2012 R2 based HumbleLab setup I tore it down  to start fresh.

As a refresher The HumbleLab lives on some pretty humble hardware:

Dell OptiPlex 990 (circa 2012)

  • Intel i7-2600, 3.4GHz 4 Cores, 8 Threads, 256KB L2, 8MB L3
  • 16GBs, Non-EEC, 1333MHz DDR3
  • Samsung SSD PM830, 128GBs SATA 3.0 Gb/s
  • Samsung SSD 840 EVO 250GBs SATA 6.0 Gb/s
  • Seagate Barracuda 1TB SATA 3.0 Gb/s

However I did managed to scrounge up a Hitachi/HGST Ultrastar 7K3000 3TB SATA drive in our parts bin that was manufactured in April 2011 to swap places with the eight year old Seagate drive.  Not only is the Hitachi drive three years newer but it also has three times as much capacity bringing a whopping 3TBs of raw storage to the little HumbleLab! Double win!

My OptiPlex lacks any kind of real storage management and my Storage Pool was configured in Simple Storage Layout which just stripes the data across all the drives in the Storage Pool. It also should go without saying that I am not using any of Storage Space’s Failover Clustering or Scale-Out functionality. I couldn’t think of simple way to swap my SATA drives other than to export my Virtual Machines, destroy the Storage Pool, swap the drives and recreate it. The only problem is I didn’t really have any readily available temporary storage that I could dump my VMs on and my lab was kind of broken so I just nuked everything and started over with a fresh install of Server 2016 which I wanted to upgrade to anyway. Oh well, sometimes the smartest way forward is kind of stupid.

Not much to say about the install process but I did run across the same “storage pool does not have sufficient eligible resources” issue creating my Storage Pool.

Neat! There’s still a rounding error in the GUI. Never change Microsoft. Never change.

According to the Internet’s most accurate source of technical information, Microsoft’s TechNet Forums, there is a rounding error in how disks are presented in the wizard. I guess what happens is when you want to use all 2.8TBs of your disk, the numbers don’t match up exactly with the actual capacity and consequently the wizard fails as it tries to create a Storage Tier bigger than the underlying disk. I guess. I mean it seems plausible at least. If you specify the size in GBs or even MBs supposedly that will work but naturally it didn’t work for me and I ended up trying to create my new Virtual Disk using PowerShell. I slowly backed off the size of my Storage Tiers from the total capacity of the underlying disks until it worked with 3GBs worth of slack space. A little disappointing that the wizard doesn’t automagically do this for you and doubly disappointing that this issue is still present in Server 2016.

Here’s my PowerShell snippet:

 

Now for the big reveal? How’d we do?

Not bad at all for running on junk! We were able to squeeze a bit more go juice out of the HumbleLab with Server 2016 and ReFS! We bumped the IOPS up to 2240 from 880 and reduced latency down to sub 2ms numbers from 4ms which is amazing considering what we are running this on.

I think that this performance increase is largely due to the combination of how Storage Tiers and ReFS are implemented in Server 2016 and not due to ReFS’s block cloning technology which is focused on optimizing certain types of storage operations associated with virtualization workloads. As I understand it, Storage Tiers previously were “passive” in the sense that a scheduled task would move hot data onto SSD tiers and cooling/cold data back onto HDD tiers whereas in Server 2016 Storage Tiers and ReFS can do realtime storage optimization. Holy shmow! Windows Server is starting to look like a real operating system these days! There are plenty of gotchas of course and it is not really clear to me whether they are talking about Storage Spaces / Storage Tiers or Storage Spaces Direct but either way I am happy with the performance increase!

Until next time!

 

The HumbleLab: Storage Spaces with Tiers – Making Pigs Fly!

I have mixed feelings about homelabs. It seems ludicrous to me that in a field that changes as fast as IT that employers do not invest in training. You would think on-the-clock time dedicated to learning would be an investment that would pay itself back in spades. I also think there is something psychologically dangerous in working your 8-10 hour day and then going home and spending your evenings and weekends studying/playing in your homelab. Unplugging and leaving computers behind is pretty important, in fact I find the more and more I do IT the less interest I have in technology in general. Something, something, make an interest a career and then learn to hate it. Oh well.

That being said, IT is a fast changing field and if you are not keeping up one way or another, you are falling behind. A homelab is one way to do this, plus sometimes it is kind of nice to just do stuff without attending governance meetings or submitting to the tyranny of your organization’s change control board.

Being the cheapskate that I am, I didn’t want to go out spend thousands of my own dollars on hardware like all the cool cats in r/homelab so I just grabbed some random crap lying around work, partly just to see how much use I could squeeze out of it.

Dell OptiPlex 990 (circa 2012)

  • Intel i7-2600, 3.4GHz 4 Cores, 8 Threads, 256KB L2, 8MB L3
  • 16GBs, Non-EEC, 1333MHz DDR3
  • Samsung SSD PM830, 128GBs SATA 3.0 Gb/s
  • Samsung SSD 840 EVO 250GBs SATA 6.0 Gb/s
  • Seagate Barracuda 1TB SATA 3.0 Gb/s

The OptiPlex shipped with just the 128GB SSD which only had enough storage capacity to host the smallest of Windows virtual machines so I scrounged up the two other disks from other desktops that were slated for recycling. I am particularly proud of the Seagate because if the datecode on the drive is to be believed it was originally manufactured sometime in late 2009.

A bit of a pig huh? Let’s see if we can make this little porker fly.

A picture of the inside of HumbleLab

Oh yeah… look at that quality hardware and cable management. Gonna be hosting prod workloads on this baby.

I started out with a pretty simple/lazy install of Windows Server 2012 R2 and the Hyper-V role. At this point in time I only had the original 128GB SSD that operating system was installed on and the ancient Seagate being utilized for .VHD/.VHDX storage.

Performance was predictably abysmal, especially once I got a SQL VM setup and “running”:

IOmeter output

At this point, I added in the other 256GB SSD, destroyed the volume I was using for .VHD/.VHDX storage and recreated it using Storage Spaces. I don’t have much to say about Storage Spaces here since I have such a simple/stupid setup. I just created a single Storage Pool using the 256GB SSD and 1TB SATA drive. Obviously with only two disks I was limited to a Simple Storage Layout (no disk redundancy/YOLO mode). I did opt to create a larger 8GB Write Cache using PowerShell but other than that I pretty much just clicked through the wizard in Server Manager:

 

Let’s see how we did:

IOMeter Results with Storage Tiers

A marked improvement! We tripled our IOPS from a snail-like 234 to a tortoise-like 820 and managed to reduce the response time from 14ms to 5ms. The latency reduction is probably the most important. We generally shoot for under 2ms for our production workloads but considering the hardware 5-6ms isn’t bad at all.

 

What if I just run .VHDX file directly on the shared 128GB SSD that the Hyper-V host is utilizing without any Storage Tiers involved at all?

Hmm… not surprisingly the results are even better but what was surprising is by how much.  We are looking at sub 2ms latency and about four and half times more IOPS than what my Storage Spaces Virtual Disk can deliver.

Of course benchmarks, especially quick and dirty ones like this, are very rarely the whole story and likely do not even come close to simulating your true workload but at least it gives us a basic picture of what my aging hardware can do: SATA = Glacial, Storage Tiers with SSD Caching=OK, SSD=Good. It also illustrates just how damn fast SSDs are. If you have a poorly performing application, moving it over to SSD storage is likely going to be the single easiest thing you can do to improve its performance. Sure, the existing bottleneck in the codebase or database design is still there, but does that matter anymore if everything is moving 4x faster? Like they say, Hardware is Cheap, Developers are Expensive.

I put this together prior to the general release of Server 2016 so it would be interesting to see if running this same setup on 2016’s implementation of Storage Spaces with ReFS instead of NTFS would yield better results. It also would be interesting to refactor the SQL database and at the very least place the TempDB, SysDBs and Log files directly onto to host’s 128GB SSD. A project for another time I guess…

Until next time… may your pigs fly!

A flying pig powered by a rocket

Additional reading / extra credit: