Tag Archives: public sector

Salary, Expectations and Automation

It has been an interesting few months. We have had a few unexpected projects pop up and I have ended up owning most of them. This led to me feel pretty beaten down and a little bit demoralized. I don’t like missing deadlines and I don’t like constantly switching from one task to the next without ever making headway. It’s not my preferred way to work.

One thing that I am continually trying to remind myself is that I should use the team. I don’t have to own everything nor should I so I started creating tickets on the behalf of my users (we don’t have a policy requiring tickets) and just dumping them into our generic queue so someone else could pick them up.

Guess what happened? They sat there. Now there are a few reasons why things played out this way (see this post) but you can imagine this was not the result I was hoping for. I was hoping my tier-2 folks would of jumped in and grabbed some of these requests:

  • Review the GPOs applied to a particular group of servers and modify them to accommodate a new service account
  • Review some NTFS permissions and restructure them to be more granular
  • Create a new IIS site along with the corresponding certificate and coordinate with our AD team to get the appropriate DNS records put in place
  • Help one of our dev teams re-platform / upgrade a COTS application
  • Re-configure IIS on a particular site to support HTTPS.

Part of the reason we have so much work right now is that we are assuming the responsibility for a department that previously had their own internal IT staff (Yay! Budget cuts!). Not everyone was happy with giving up “their IT guys” and so during our team meetings we started reviewing work in the queue that was not getting moved along.

A bunch of these unloved tickets were “mine”, that is to say, they were originally requests that came directly to me, that I then created a ticket for hoping to bump it back into the queue. This should sound familiar. The consensus though was that it was “my work” and that I was not being diligent enough in keeping track of the ticket queue.

Please bear in mind for the next two paragraphs, that we have a small 12 person team. It is not difficult for us to get a hold of another team member.

I’ll unpack the latter idea first. In a sense, I agree. I could do a better job of watching the queue but that’s simply because I was not watching it. My perception was, that as someone who is nominally at the top of our support tier is that our help desk watches the queue, catches interrupts from customers and then escalates stuff if they need assistance. I was thinking my tickets should come from my team and not directly from the queue.

The former idea I’m a little less sympathetic too. It’s not “my work”, it’s the team’s work, right? And here is where those sour grapes start to ferment… that list of tickets up there does not seem like “tier-3 work” to me. It seems junior sysadmins’ work. If that is not the case then I have to ask the question: What are those guys doing instead? If that’s not “work” that tier-1/tier-2 handle then what is?

In the end, of course, I took the tickets and did the work, which of course put me even further behind on some of my projects.

I have puzzled over our ticket system, support process and team dynamics quite a bit (see here, here and here) and there is a lot of different threads one could pull on, but a new explanation came to mind after this exercise: Maybe our tier-2 guys are not doing this work because they can’t? Maybe they just don’t have the skills to do those kinds of things and maybe it’s not realistic to expect people to have that level of skill, independence and work ethic for what we pay them? I hate this idea. I hate it because if that’s truly the case there is very little I can do to fix it. I don’t control our training budget or assign team priorities or have any ability to negotiate graduated raises matched with a corresponding training plan. I don’t do employee evaluations and I cannot put someone on an improvement plan and I certainly cannot let an employee go. But I really don’t like this idea because it feels like I’m crapping on my team. I don’t like it because it makes me feel guilty.

But our are salaries and expectations unrealistic?

  • Help Desk Staff (Tier-1) – $44k  – $50k per year
  • Junior Sysadmins (Tier-2) – $58k – $68 per year
  • Sysadmins (Tier-3) – $68k – 78k per year

It’s a pretty normal “white collar” setup: salaried, no overtime eligibility, with health insurance and a 401k with a decent employer match. We can’t really do flexible work schedules or work-from-home but we do have a pretty generous paid leave policy. However – this is Alaska, where everything is as expensive as the scenery is beautiful. A one bedroom rental will run you at least $1200 a month plus utilities which can easily be a few hundred dollars in the winter depending on your heating source. Gasoline is on average a dollar more per gallon than whatever it is currently in the Lower 48. Childcare is about $1100 a month per kiddo for kids under three. Your standard “American dream” three bedroom, two bath house will cost you around $380,000. All things being equal, it is about 25% more expensive to live here than your average American city so when you think about these wages knock a quarter of them off to adjust for cost of living.

Those wages don’t look so hot anymore huh? Maybe there is a reason (other than our State’s current recession) that most IT positions in my organization take at least six months to fill. The talent pool is shallow and not that many people are interested in wading in.

We all have our strengths and weaknesses. I suspect our team is much like others with a spectrum of talent but I think the cracks are beginning to show… as positions are cut, work is not being evenly distributed and fewer and fewer team members are taking more and more of the work. I suspect that’s because these team members have to skills to eat that workload with automation. They can use PowerShell to do account provisioning instead of clicking through Active Directory Users and Computes. They can use SCCM to install Visio instead of RDPing and pressing Next-Next-Finish on each computer. A high performing team member would realize that the only way they could do that much work was learn some automation skills. A low performing team member would do what instead? I’m not sure. But maybe, just maybe, as we put increasing pressure on our tier-1 and tier-2 staff to “up their skills” and to “eat the work”, we are not being realistic.

Would you expect someone making 44k – 51k a year in Cost of Living adjusted wages to be an SCCM wizard? Or pickup PowerShell?

Are we asking to much of our staff? What would you expect someone paid these wags to be able to do? Like all my posts – I have no answers, only questions but hopefully I’m asking the right ones.

Until next time, stay frosty!

Kafka in IT: How a Simple Change Can Take a Year to Implement

Public sector IT has never had a reputation of being particularly fast-moving or responsive. In fact, it seems to have a reputation for being staffed by apathetic under-skilled workers toiling away in basements and boiler rooms supporting legacy, “mission-critical”, monolithic applications that sit half-finished and half-deployed by their long-gone and erstwhile overpaid contractors (*cough* Deloitte, CGI *cough*). This topic might seem familiar… Budget Cuts and Consolidation and Are GOV IT teams apathetic?

Why do things move so slow, especially in a field that demands the opposite? I don’t have an answer to that larger question but I do have an object lesson, well maybe what I really have is part-apology, part-explanation and part-catharsis. Gather around and hear The tale of how a single change to our organization’s perimeter proxy devices took a year!

 

03/10

We get a ticket stating that one of our teams’ development servers is no longer letting them access it via UNC share or RDP. I assign one of our tier-2 guys to take a look and a few days later it gets escalated to me. The server will not respond to any incoming network traffic, but if I access it via console and send traffic out it magically works. This smells suspiciously like a host-based firewall acting up but our security team swears up and down our Host Intrusion Protection software is in “detect” mode and I verified that we have disabled the native Windows firewall. I open up a few support tickets with our vendors and start chasing such wild geese as a layer-2 disjoint in our UCS fabric and “asymmetric routing” issues. No dice. Eventually someone gets the smart idea to move the IP address to another VM to try and narrow the issue down to either the VM or the environment. It’s the VM (of course it is)! These shenanigans take two weeks.

04/01

I finish re-platforming the development server onto a new Server 2012 R2 virtual machine. This in-of-itself would be worth a post since the best way I can summarize our deployment methodology is “guess-and-check”. Anyway, the immediate issue is now resolved. YAY!

05/01

I rebuild the entire development, testing, staging and production stack and migrate everything over except the production server which is publically accessible. The dev team wants to do a soft cutover instead of just moving the IP address to the new server. This means we will need have our networking team make some changes to the proxy perimeter devices.

05/15

I catch up on other work and finish the roughly ten pages of forms, diagrams and a security plan that are required for a perimeter device change request.

06/06

I open a ticket upstream, discuss the change with the network team and make some minor modifications to the ticket.

06/08

I filled out the wrong forms and/or I filled them out incorrectly. Whoops.

06/17

After a few tries I get the right forms and diagrams filled out. The ticket gets assigned to the security team for approval.

06/20

Someone from the security team picks up the ticket and begins to review it.

07/06

Sweet! Two weeks later my change request gets approval from the security team (that’s actually pretty fast). The ticket gets transfered back to the networking team which begins to work on implementation.

07/18

I create a separate ticket to track the required SSL/TLS certificate I will need for the HTTPS-enabled services on the server. This ticket follows a similar parallel process, documentation is filled out and validated, goes to the security team for approval and then back to the networking team for implementation. My original ticket for the perimeter change is still being worked on.

08/01

A firmware upgrade on the perimeter devices breaks high availability. The network team freezes all new work until the issue is corrected (they start their internal change control process for emergency break/fix issues).

08/24

The server’s HTTPS certificate has to be replaced before it expires at the end of the month. Our dev’s business group coughs up the few hundred dollars. We had planned to use the perimeter proxies’ wildcard certificate for no extra cost but oh well, too late.

09/06

HA restored! Wonderful! New configuration changes are released to the networking team for implementation.

10/01

Nothing happens upstream… I am not sure why.  I call about once a week and hear, we are swamped, two weeks until implementation. Should be soon.

11/03

The ticket gets transferred to another member of the network team and within a week the configuration change is ready for testing.

11/07

The dev team discovers an issue. Their application is relying on the originating client IP address for logging and what basically amounts to “two-factor authentication” (i.e., a username is tied to an IP address). This breaks fantastically once the service gets moved behind a web proxy. Neat.

11/09

I work with the dev lead and the networking team to come up with a solution. Turns out we can pass the originating IP address through the proxies but it changes the variable server-side that their code needs to reference.

11/28

Business leaders say that the code change is a no-go. We are about to hit their “code/infrastructure freeze” period that last from December to April. Fair enough.

12/01

We hit the code freeze. Things open back up again in mid-April. Looking ahead, I already have infrastructure work scheduled late April and early May which brings us right around to June: one year.

EDIT: The change was committed on 05/30 and we passed our rollback period on 06/14. As of 06/19 I just submitted the last ticket to our networking team to remove the legacy configuration.

 

*WHEW* Let’s take a break. Here’s doge to entertain you during the intermission:

 

My team is asking for a change that involves taking approximately six services that are already publically accessible via a legacy implementation, moving those services to a single IP address and placing an application proxy between the Big Bad Internet and the hosting servers. Nothing too crazy here.

Here’s some parting thoughts to ponder.

  • ITIL. Love it or hate it ITIL adds a lot of drag. I hope it adds some value.
  • I don’t mean to pick on the other teams but it clearly seems like they don’t have enough resources (expertise, team members, whatever they need they don’t have enough of it).
  • I could have done better with all the process stuff on my end. Momentum is important so I probably should not of let some of that paperwork sit for as long as it did.
  • The specialization of teams cuts both ways. It is easy to slip from being isolated and silo-ed to just basic outright distrust, and when you assume that everyone is out to get you (probably because that’s what experience has taught you) then you C.Y.A. ten times till Sunday to protect yourself and your team. Combine this with ITIL for a particularly potent blend of bureaucratic misery.
  • Centralized teams like networking and security that are not embedded in different business groups end up serving a whole bunch of different masters. All of whom are going in different directions and want different things. In our organization this seems to mean that the loudest, meanest person who is holding their feet to the SLA gets whatever they want at the expense of their quieter customers like myself.
  • Little time lags magnify delay as the project goes on. Two weeks in security approval limbo puts me four weeks behind a few months down the road which means I then miss my certificate expiry deadline which then means I need to fill out another ticket which then puts me further behind and so on ad infinitum.
  • This kind of shit is why developers are just saying “#YOLO! Screw you Ops! LEEEEROY JENKINS! We are moving to the Cloud!” and ignoring all this on-prem, organizational pain and doing DevDev (it’s like DevOps but it leads to hilarious brokenness in other new and exciting ways).
  • Public Sector IT runs on chaos, disorder and the frustrations of people just trying to Do Things. See anything ever written by Kafka.
  • ITIL. I thought it was worth mentioning twice because that’s how much overhead it adds (by design).

 

Until next time, may your tickets be speedily resolved.

The Big Squeeze, Predictions in Pessimism


Layoff notice or stay the hell away from Alaska when oil is cheap… from sysadmin

 

I thought this would be a technical blog acting as a surrogate for my participation on ServerFault but instead it has morphed into some kind of weird meta-sysadmin blog/soap box/long-form of a reply on r/sysadmin. I guess I am OK with that…

Alaska is a boom and bust economy and despite having a lot going for us fiscally, a combination of our tax structure, oil prices and the Legislature’s approach to the ongoing budget deficit, we are doing our best to auger our economy into the ground. Time for a bit of gallows humor to commiserate with u/Clovis69! The best part of predictions is you get to see how hilariously uninformed you were down the road! Plus, if you are going to draw straws you might as well take bets on who gets the shortest one.

Be forewarned, I am not an economist, I am not even really that informed and if you are my employer, future or otherwise, I am largely being facetious.

The Micro View (what will happen to me and my shop)

  • We will take another 15-20% personnel cuts in IT operations (desktop, server and infrastructure support). That will bring us to close to a 45% reduction in staff since 2015.
  • We will take on additional IT workload as our programming teams continue to lose personnel and consequently shed operational tasks they were doing independently.
  • We will be required to adopt a low-touch, automation-centric support model in order to cope with the workload. We will not have the resources to do the kind of interrupt-driven, in-person support we do now. This is a huge change from our current culture.
  • We will lean really hard on folks that know SCCM, PowerShell, Group Policy and other automation frameworks. Tier-2/Tier-3 will come under more pressure as the interrupt rate increases due to the reduction in Tier-1 staff.
  • Team members that do not adopt automation frameworks will find themselves doing whatever non-automatable grunt work there is left. They will also be more likely to lose their jobs.
  • We will lose a critical team member that is performing this increased automation work as they can simply get paid better elsewhere without having a budget deficit hanging over their head.
  • If we do not complete our consolidation work to standardize and bring silo-ed teams together before we lose what little operational capacity we have left our shop will slip into full blown reactive mode. Preventive maintenance will not get done and in two years time things will be Bad (TM). I mean like straight-up r/sysadmin horror story Bad (TM).
  • I would be surprised if I am still in the same role in the same team.
  • We will somehow have even more meetings.

The Macro View (what will happen to my organization)

Preliminary plans to consolidate IT operations were introduced back in early 2015. In short, our administrative functions including IT operations, are largely decentralized and done at the department level. This leads to a lot of redundant work being performed, poor alignment of IT to the business goals of the organization as a whole, the inability to capture or recover value from economies of scale and widely disparate resources, functionality and service delivery. At a practical level, what this means is there are a whole lot of folks like myself all working to assimilate new workload, standardize it and then automate it as we cope with staff reduction. We are all hurriedly building levers to help us move more and more weight but no one has stopped to say, “Hey guys, if we all work together to build one lever we can move things that are an order of magnitude heavier,” consequently as valiant as our individual efforts are we are going to fail. If I lose four people out of a team of eight, no level of automation that I can come up with will keep our heads above water.

At this point I am not optimistic about our chances for success. The tempo of a project is often determined by its initial pace. I have never seen an IT project move faster as time goes on in the public sector; generally it moves slower and slower as it grinds through the layers of bureaucracy and swims upstream against the prevailing current of institutional inertia and resistance. It has been over a year without any progress that is visible to the rank-and-file staff such as myself and we only have about one, maybe two years, of money left in the piggy bank before we find that the income side of our balance sheet is only 35% of our expenses. To make things even more problematic entities that do want to give up control have had close to two years to actively position themselves to protect their internal IT.

I want IT consolidation to succeed. It seems like the only possible way to continue to provide a similar level of service in the face of a 30-60% staff reduction. I mean, what the hell else are we going to do? Are we going to keep doing things the same way until we run out of money, turn the lights off and go home? If it takes one person on my team to run SCCM for my 800 endpoints, and three people from your team to run SCCM for your 3000 endpoints, how much do you want to bet the four of them could run SCCM for all 12,000 of our endpoints? I am pretty damn confident they could. And this scenario repeats everywhere. We are bailing out our boats, and in each boat is one piece of a high volume bilge pump but we don’t trust each other and no one is talking and we are all moving in a million different directions instead of stopping, collectively getting over whatever stupid pettiness that keeps us from actually doing something smart for once and putting together our badass high volume bilge pump. We will either float together or drown separately.

I seem to recall a similar problem from our nation’s history…

Benjamin Franklin's Join or Die Political Cartoon

Are GOV IT teams apathetic?

I have been stewing about this post on r/sysadminIs apathy a problem in most government IT teams?, for a while and felt like it was worth a quick write-up since most of my short IT career has been spent in the public sector.

First off, apathy and team dysfunction is a problem everywhere. There is nothing unique about government employees versus private employees in that respect. What I think the poster is really asking is, “Is there something about government IT that produces apathetic teams?” and if you read a little deeper it seems like apathy really means “permanent discouragement”; that is to say, the condition where change, “doing things right or better”, greater efficiency are or seemly are impossible. When you read something like, “…trying to make things more efficient is met with reactions like ‘oh you naive boy’ and finger pointing,” it is hard to think of just plain old vanilla apathy.

Government is not a business (despite what some people think). Programs operate at a loss, are subsidized in many cases entirely, by taxes because the public and/or their representatives deems those programs worthy. The failure mechanism of market competition doesn’t exist. Incredibly effective programs can be cancelled because they are no longer politically favorable and incredibly ineffective programs can continue or expand because they have political support. Furthermore, in all things public servants need to remain impartial, unbiased and above impropriety. This leads to vast and byzantine processes, the components of which singularly make imminent good sense (for example, the prohibition of no-bid contracts) but collectively all these well-intentioned barnacles slow the ship-of-state dramatically. Success is not rewarded with growth either. Implementing a more efficient process, a more cost effective infrastructure and saving money generally results in less money. This tendency of budget reduction (“Hey, if you saved it, you did not need it to begin with, right?”) turns highly functioning teams into disasters overtime as they lose resources. Paradoxically, the better you are at utilizing your existing resources, the less you get. Finally, your entire leadership changes with every administration change. You may still be shoveling coal down in the engine room, but the new skipper just sent down word to reduce steam and come about hard in order to head in the opposite direction. Generally private companies that do this kind of thing, with this frequency, do not last long.

How does all this apply to Information Technology? It means that your organization will move very, very slow and technology moves very, very fast. Not a good combo.

 

Those are the challenges that a team faces but what about the other half of the equation… the people facing them?

Job classes are just one small part of this picture but they are emblematic of some of the challenges that face team leads and managers when dealing with the ‘People’, piece of People, Process and Technology (ITIL buzzword detected! +5 points). The idea of job classes is that across the organization people doing similar work should be paid the same. The problem lies in that updating a job class is beyond onerous and the time to completion is measured in years. Do you know how quickly Information Technology reinvents itself? Really quick. This means that job classes and their associated salaries tend to drift away from the actual on-the-ground work being done and the appropriate compensation level over time, making recruitment of new staff and retention of your best staff very difficult (The Dead Sea Effect). If you combine this with a lack of training and professional development, staff has a tendency to get pigeon-holed into a particular role without a clear promotion path. Furthermore, many of the job class series are disjointed in such a way as working at the top of one job series will not meet the prerequisites for another job series, making advancement difficult, and at least on paper sometimes impossible. For example: you could work as a Lead Programmer for three years leading a team of five people and not qualify, at least on paper, for an entry level IT Manager position.

How does all this apply to Information Technology? People get stuck doing one job, for too long, with no professional training or mentorship. Their skillsets decline towards obsolescence and they become frustrated and discouraged.

 

I have never met anyone in the public sector that just straight up did not give a crap. I have met people that feel stuck, discouraged, marginalized and ignored. And rightly so. Getting stuff done is very hard. It is like everyone has one ingredient necessary to make a cake, and you all more, or less, agree on the recipe. You are all trained and experienced bakers. You can easily make a cake but you each have 100 pieces of paperwork you have to fill out and wait on, sometimes for months, before you can do your part of the cake-baking process. You have 10 different bosses, each telling you to make a different desert when you know that cakes are by far the best desert for your particular bakery. Then you get yelled at for not making a cake in a timely manner, and then you all fired and replaced by food service contractors whose parent company charges an exorbitant hourly rate. But hey, the public eventually got their cake right? Or at least a donut. Not exactly what they ordered but better than nothing … right?

If IT is a thankless job (and I am not sure I agree with that piece of Sysadmin mythology), then Public Sector IT is even more thankless. You will face a Kafkaesque bureaucracy. You will likely be very underpaid and have a difficult time seeking promotion. You will never be able to accept a vendor-provided gift or meal over the price of $25. You will laugh when people ask if you plan on attending VMworld. The public will stereotype you as lazy, ineffective and overpaid. But you will preserve. You have a duty to the body politic to do your best with what you have. You will keep the lights on, you will keep the ship afloat even as more and more water pours in. You have to. Because that’s what Systems Administrators do.

And all you wanted was to simply make a cake.