Egy újszülöttnek minden vicc új, így én a régi viccekre szakosodtam, azokat mondom el újra és újra.

Floorshrink diaries

Floorshrink diaries

TLDR #6.1: The hurdles of creating a private cloud

2020. április 12. - Floorshrink

the_gap.jpgThe following post is about the pitfalls of creating a private cloud. My aim is to list some the potential pitfalls down the road implementing a private cloud, and not to take away your ambition. This is the opposite: You are running against time. Once the regulatory environment (in CEE) lifts the restrictions on the usage of offerings from public cloud providers, you are going to face a formidable competition with offerings being honed for ten plus years. You have to find niches where you can beat them before this change happens. So here we go:

Problem #0: I added this paragraph after receiving an interesting comment on the original post. You do not actually know why you want to have a private cloud. The elephant in the room: you have to have applications that can utilize the capabilities of the new infrastructure - most notably auto scaling to serve the actual requirements. And this is a way bigger issue than the infrastructure itself. 

 Problem #1: the foundation is insufficient. Your virtualized environment is probably the product of several years of patchwork technology choices and makeshift automation fragments. You are tempted to listen to the siren song of the sales folks of vendors and downplay the importance of dreary things like processes, let alone automation, your mind being spellbound with the big, shiny iron.

 Ask yourself these questions:

  •  Do you run a service based operating model? To be less academic, do you have a service catalog that is actually used by your customers?
  • Do you have technology standards that you can actually enforce or you bend over to any exotic request? (imagine your test matrix with 3 hypervisors, 4 server OS-s and 2 application servers)
  • How long does it take to serve a new request for a VM? If the answer is 4+ months, then identify what you will (have to) change if you want to bring down this time to a few hours.
  • Do you know your existing compute-storage-network capacity and their utilization?
  • Do you know what the workload is that consumes the above capacity and can you pair the applications with their infrastructure layer? (do you have an up to date technical asset management DB?)
  • Do you have a well-documented load balancing, High Availability and Disaster Recovery service that you provide to your clients? You can be assured that they will not settle for anything less than they have today.
  • Do you monitor your current physical and virtual infrastructure and the application layer within the same process framework and with the same tools? You will need to provide these services in your private cloud as well, preferably via the same pane of glass and by the same people.
  • Do you have a ballpark idea about the cost of a VM you produce? Do your clients care about the cost, or it doesn’t matter at all?

 If the answer to any these questions is no, then you have a homework to do as part of your private cloud project.

Problem #2:  treating the effort as if it was just an automation add-on on top of your existing virtualized environment. There are two issues in this approach:

  • Provisioning and decommissioning: without changing the underlying provisioning processes your new private cloud offering will feel very similar to the existing (and loathed) physical HW provisioning. Equally important: as my favorite band once put it, “When the music’s over, turn out the light”. You have to make sure that unused capacity is returned to the pool, otherwise you will run out of it very soon. And here lays a paradox: you have to gain the trust of your clients that they will get another compute node when they need it and they will get it fast and you need this trust by the time you roll out your first private cloud VM. (otherwise they will stick to their assets regardless if they actually do anything with it or not.)
  • Procurement: Imagine that a public cloud provider tells you to wait a few months with your request for the next VM, claiming that they need to get the purchase approved, then need to run a procurement process to select the HW vendor, wait a few months, check if there is any free rack available in their DC and then, they will be happy to serve you. You actually do the same when you advertise you fancy new stuff with a lightning fast provisioning (say two days vs. the current 4 months), then you add 4 months in the fine print since you did not change the supporting procurement process. The key is to build capacity WITHOUT knowing who will use it. For this you will have to convince your financial department to run the shop as if it was a mini service provider, ie. not insisting on distributing all costs “somewhere.”

 

Problem #3: driving the whole effort with an engineering only mindset. Build it and they will come” is carved into many project tombstones. You should not forget about the payload and WHY and how this payload will be moved to the new environment. Ask the following questions:

  •  What will be the motivation of your clients to move to your new offering? The business could not care less if a given workload runs on top of a physical hardware, on a traditional VM or in your private cloud, especially when there is no established charge back model in your company. Unless there is a compelling reason any migration effort that takes away key people from creating new business functionality will be considered as an impediment to their progress, ie. may be very slow.
  • How fragmented is the application portfolio from an infrastructure requirement standpoint? The stronger you stick to the new standards to reduce build and maintenance complexity, the bigger the migration effort becomes, especially when there is a large amount of technical debt piled up over the years under these applications. It creates a gap between the current and the target infrastructure. You need to find a sweet spot of requirements which is large enough to matter and has a reason to move. (eg. when the vendor is no longer interested in providing a fig leaf to cover your ass to the regulators called extended support.)

 

Problem #4: the human factor. Your colleagues are not „resources”, they are human beings with their own skills, fears and agendas. Check out these questions while you put yourself in their shoes: 

  • Does creating a private cloud - let alone a containerized compute platform - require the same skillsets ie. the same people as the old school physical environment? Spoiler alert: it doesn’t.
  • What is the chance that your current staff will pick up the news skills fast enough? Chances are, if they had this skillset, they would be somewhere else already.
  • Are you prepared to create/hire (let alone retain) a dedicated team of automation engineers (in fact developers) and process people to make it happen? (reallocating 20% of the bandwidth of your existing people won’t cut it.)
  • Are you prepared to handle the compensation gap between the above mentioned two groups?

 The paradox is that you definitely need your existing staff to keep the business running. While some part of this crew will be prepared for the new technology and processes, some other part will fall behind and may even try to make the project fail. If you are still in the mood of creating your own private cloud after the questions above, here are a few considerations for you.

The technical considerations

  • Do the homework and define a minimum viable product that answers the needs of a double-digit subset of the existing application portfolio.
  • Walk before you run, start with a pure play IaaS, and support containers only in phase II.
  • Be prepared to offer a very low number of offerings. As Henry Ford once put it: “A customer can have a car painted any color he wants as long as it’s black”. One hypervisor, two guest OS-s (RHEL and Windows are safe bets), SAN based storage with basic HA and DR support, 3 T-Shirt sizes, IaaS only.
  • On the other hand, be generous with RAM and be prepared to offer fast and reliable provisioning with functioning monitoring and management tools. Make sure you have enough bandwidth to serve these VMs.
  • Focus on seamless provisioning with minimal number of manual steps. It makes little sense to have beautiful scripts spinning up the core VM image if it takes another day to apply all the missing patches or if your DNS propagation needs a day. Automation means using API-s, rather than clicking in GUI-s. Make sure your automation tools are in sync with those used by the application layer folks for their build process.
  • When you think about automation, treat all layers equally, ie. include storage and networking in your automation efforts. Avoid the doom of Conway’s law like when you break the process along the borders between the various units in your org responsible for creating a solution. (imagine when the VM automation process has to create a SNOW ticket to get the disk and an IP address.)
  • Be prepared to answer security considerations: can all apps coexist on the same physical hardware and subnet or you need physical isolation between application tiers?
  • Integrate with the existing core services like the corporate directory, firewall, monitoring tools, CMDB, while keep the load balancers in scope.
  • Provide HA and DR support from the beginning. If the developers are accustomed to a storage-based DB replication, keep it. A naked VM might be good for PoC and testing purposes, but if it falls short compared to current offerings, it will be relegated to the above functions.
  • Go for overprovisioning and avoid reserved instances as much as the political environment allows.
  • Storage quotas are goodness, especially if they trigger a data lifecycle management effort and not just an outcry for more disks.
  • Create some rudimentary billing from the start. Make sure your stuff looks cheaper than the competing physical offerings. If the word cross financing comes to you mind, team up with your Finance colleagues and make it happen! (of course, this helps only if there is a cross charge model in place already.)
  • If this project is a priority, then staff it accordingly. You cannot make it happen by reallocating 20% the existing time of your existing people, that’s just tire kicking.

 

 And finally, a few DO-s and DO NOT-s

  • Make the business understand the key value proposition of the whole thing: this is agility, not the cost! Get their long-term commitment, going back to the budgeting table every time is time consuming. (minus cases like the current virus triggered economy melt down…)
  • Understand the pain points of your user community and create a unique selling point by easing this pain. You need friends to make it happen.
  • Understand what the current application portfolio runs on (down to the Java framework versions, RDBMS versions, application server and OS versions, storage requirements) and be prepared to serve a core subset upfront while resist serving every other demand at start. Agree on an MVP and make sure you have something tangible to offer soon, while not committing to unrealistic deadlines.
  • Track and coordinate with other key projects effecting the infrastructure. (eg. a firm directory revamp or replacing the firewalls or a simple DC move)
  • Do not start your project with buying a truckload of hardware. Most of your difficulties are not HW related anyway and you might write them down by the time the project is finished.

 I would like to thank Gabor Illyes and Zeno Horvath for his insight on this topic. As always, I appreciate any feedback or comment.

 

A bejegyzés trackback címe:

https://floorshrink.blog.hu/api/trackback/id/tr2115608592

Kommentek:

A hozzászólások a vonatkozó jogszabályok  értelmében felhasználói tartalomnak minősülnek, értük a szolgáltatás technikai  üzemeltetője semmilyen felelősséget nem vállal, azokat nem ellenőrzi. Kifogás esetén forduljon a blog szerkesztőjéhez. Részletek a  Felhasználási feltételekben és az adatvédelmi tájékoztatóban.

Nincsenek hozzászólások.
süti beállítások módosítása