I am going to be upfront with you. You are about to read a long and meandering post that will seem almost a little too whiny at times where I talk some crap about our developers and their burdens (applications). I like our dev teams and I like to think I work really well with their leads so think of this post as a bit of satirical sibling rivalry and underneath the hyperbole and good nature-ed teasing there might be a small, “little-t” truth.
That truth is that operations, whether it’s the database administrator, the network team, the sysadmins or the help desk, always, always, always gets the short straw and that is because collectively we own “the forest” that the developers tend to their “trees” in.
I have a lot to say about the oft-repeated sysadmin myth about “how misunderstood sysadmins are” and how the they just seem to get stepped on all the time and so on and so on. I am not a big fan of the “special snowflake sysadmin syndrome” and especially not a fan of it when it is used as an excuse to be rude or unprofessional and at the risk of contradicting my earlier statement I think it is worth stating that even I know I am half full of crap when I say sysadmins always get the short straw.
OK disclaimers are all done! Lets tell some stories!
DevOps – That means I get Local Admin right?
My organization is quite granular and each of our departments more or less maintain their own development teams supporting their own mission-specific applications along with either a developer that essentially fulfilled an operations role or a single operations guy doing support. The “central” team maintained things like the LAN, Active Directory, the virtualization platform and so on. If the powers on high wanted a new application for their department, the developers would request the required virtual machines, the Operations Team would spin up a dozen VMs off of a template, join them to AD, give the developers local admin and they’d be on their merry way.
Much like Bob Belcher, all the Ops guys could do is “complain the whole time”.
This arrangement led to some amazing things that break in ways that are too awesome to truly describe:
- We have department with a staff of 120 and 180 Active Directory Security Groups. At last count some 45 are completely empty. Auditing NTFS permissions is… uh difficult?
- We have an in-house application that uses SharePoint as a front-end, calls some in-house web services tied to a database or two that auto-populates an Excel spreadsheet that is used for timekeeping. Everyone else, just fills out the spreadsheet.
- We have another SharePoint integrated application that is used ironically enough for compliance training that passes your Active Directory credentials in plaintext through two or three servers all hosting different web services.
- Our deployment process is use Windows File Explorer to copy everything off your workstation onto the IIS servers.
- Our revision control is: E:\WWW\Site, E:\WWW\Site (Copy), E:\WWW-Site-Dev McDeveloper
- We have an application that manages account on-boarding, a process of which is already automated by our Active Directory team.
- We had at one point in time, four or five different backup systems all of which used BackupExec for some insane reason, three of which backed up the same data.
- And then there’s Jenga: Enterprise Edition…
Jenga: Enterprise Edition – Not so fun when it needs four nines of uptime.
What you are looking at is my humorous attempt to scribble out a satirical sketch of one of our line-of-business applications which managed to actually turn out pretty accurate. The Jenga application is so named because all the pieces are interconnected in ways that turn the prospect of upgrading any of it into the project of upgrading all it. Ready? Ere’ we go!
It’s built around a core written in a language that we haven’t had any on-staff expertise in for the better part of ten years. In order to provide the functionality that business needed as the core aged, the developers wrote new “modules” in more current and maintainable languages that essentially just call APIs or exposed services and bolted them on. The database is relatively small, around 6 TBs, but almost 90% of it is static read-only data that we cannot separate out which drastically reduces the cool things our DBA and myself can do in terms of recovery, backup and replication and performance optimization. There is no truly separate development or testing environments so we use snapshot copies to expose what appear to be “atomic” copies of the production data (which contains PII!) on two or three other servers so our developers can validate application operations against it. We used to do this with manual fricking database restores, which was god damned expensive in terms of time and storage. There are no less than eight database servers involved but the application cannot be distributed or setup in some kind of multi-master deployment with convergence so staff at remote sites suffer abysmal performance if anything resembling contention happens on their shared last-mile connections, the “service accounts” are literally user accounts that the developers use to RDP to the servers, start the application’s GUI, and then enable the application’s various services via interacting with above mentioned GUI (any hick-up in the RDP session and *poof* there goes that service), the public facing web server directly queries the production database (our DBA’s favorite piece), the internally consumed pieces of the application and the externally consumed pieces are co-mingled, meaning an outage anywhere is an outage everywhere, the client requires a hard-coded drive map to run since application upgrades are handled internally with copy jobs essentially replacing all the local .DLLs when new ones are detected, oh and it runs on out of support versions SQL.
Whew. That’s was a lot. Sorry about that. Despite that the fact that a whole department pretty much lives or dies by this application’s continued functionality our devs haven’t made much progress in re-architecturing and modernizing this application. Now this really isn’t their fault but it doesn’t change the fact that my team has an increasingly hard time keeping this thing running in a satisfactory manner.
Operations: The Digital Custodian Team.
Somewhere in our brain storming session of trying to figure out how to move Jenga to a new virtualization infrastructure, all on a weekend when I’ll be traveling in order to squeeze the outage into the only period within the next two months that wasn’t going to be unduly disruptive I began to feel like my team was getting screwed. They have more developers supporting this application than we have in our whole operations team and its on us to figure out how to move Jenga without losing any blocks or having any lengthy service windows? What are those guys actually working on over there? Why are we trying to figure out which missing .DLL from .NET 1.0 needs be imported onto the new IIS 8.5 web server so some obscure service than no really one understands runs in a supported environment? Why does operations own the life-cycle management? Shouldn’t the developers be updating and re-writing code to reflect the underlying environmental and API changes each time a new server OS is released with a new set of libraries? Why is our business expectations for application reliability so widely out-of-sync with what the architecture can actually deliver? What’s going on here?
Honestly. I don’t know but it sucks. It sucks for the customers, it sucks for the devs but mostly it sucks my team because we have to support four other line of business applications. We own the forest right? So when a particular tree catches on fire they call us to figure out what to do. No one mentions that we probably shouldn’t expect trees wrapped in paraffin wax and then doused in diesel fuel not to catch on fire. When we point out that tending trees in this manner probably won’t deliver the best results if you want something other than a bonfire we get met with a vague shrug.
Is this how it works? Your team of rockstar, “creative-type”, code-poets whip up some kind of amazing business application, celebrate and then hand it off to operations where we have to figure out how to keep it alive as the platform and code base age into senility for the next 20 years? I mean who owns the on-call phone for all these applications… hint: it’s not the dev team.
I understand that sometimes messes happen… just why does it feel like we are the only ones cleaning it up?
You’re not my Supervisor! Organizational Structure and Silos!
At first blush I was going to blame my favorite patsy, Process Improvement and the inspid industry around it for this current state of affairs but after some thought I think the real answer here is something much simpler: the dev team and my team don’t work for the same person. Not even close. If we play a little game of “trace the organizational chart” we have five layers of management before we reach a position that has direct reports that eventually lead to both teams. Each one of those layers is a person – with their own concerns, motivations, proclivities and spin they put on any given situation. The developers and operations team (“dudes that work”), more or less, agree that the design of the Jenga application is Not a Good Thing (TM). But as each team gets directed to move in a certain direction by each layer of management our efforts and goals diverge. No amount of fuzzy-wuzzy DevOps or new-fangled Agile Standup Kanban Continuous Integration Gamefication Buzzword Compliant bullshit is ever going to change that. Nothing makes “enemies” out of friends faster than two (or three or four) managers maneuvering for leverage and dragging their teams along with them. I cannot help but wonder what our culture would be like if the lead devs sat right next to me and we established project teams out of our combined pool of developers and operations talent as individual department’s put forth work. What would things be like if our developers weren’t chained to some stupid line-of-business application from the late ’80s, toiling away to polish a turd and implement feature requests like some kind of modern Promethian myth? What would things be like if our operations team wasn’t constantly trying to figure out how to make old crap run while our budgets and staff are whittled away, snatching victory from defeat time and time again only to watch the cycle of mistakes restart itself again and again like some kind Sisyphean dystopia with cubicles? What if we could sit down together and I dunno… fix things?
Sorry there are no great conclusions or flashes of prophetic insight here, I am just as uninformed as the rest of the unwashed masses, but I cannot help but think, maybe, maybe we have too many chefs in the kitchen arguing about the menu. But then again, what do I know? I’m just the custodian.
Until next time, stay frosty.