Egy újszülöttnek minden vicc új, így én a régi viccekre szakosodtam, azokat mondom el újra és újra.

Floorshrink diaries

Floorshrink diaries

Please unload your gun before entering - javított kiadás

2026. június 08. - Floorshrink

the_prodigal_robber.png

 

Több visszajelzést is kaptam a múlt heti szösszenet kapcsán, amelyek rámutattak a post hibáira. Ezúton is köszönöm az észrevételeket, íme a javított változat.

A kulcs észrevételek:

  • A posztban összemostam két különböző kockázatot, ti. az AI önmagában hordozott kockázatait (pl. politikai befolyásolás, nem transzparens algoritmus alapján való automatizált döntéshozatal), és az a kockázatot, hogy az EU lemarad az AI versenyfutásban.
  •  az AI mellékhatásait épp olyan nehéz és költséges utólag kezelni, mint ahogy egy IT infrastruktúrába utólag belebarkácsolni az ITSec szempontjait. (touche)
  •  a belépési küszöb valóban leesett az AI felhasználói oldalán, de a modellek tanításához szükséges chip- és adatközpont-kapacitásnál ez még nem történt meg. (továbbra is vagyonokba kerül.) Ez a nukleáris célokra alkalmazható urándúsításhoz hasonló szűk keresztmetszet: kevés gyártó, követhető szállítási lánc. (lásd Stuxnet)
  • a „megállíthatatlan" és a „kontrollálhatatlan" nem ugyanaz. A fejlődés iránya tényleg megállíthatatlan (ebben mindenki egyetértett), de a sebesség és a forma még alakítható — pl. a Red Flag Act sem megállította az autók elterjedését, csak átterelte a fejlődést Németországba.
  • ha elfogadjuk, hogy az AI a nukleáris fegyverekkel összemérhető kockázatot jelent, akkor azonos szinten kell és lehet is azt kezelni.

 

A fentiek alapján a korrekcióim:

  • Fenntartom, hogy az EU-nak oda kell tennie magát az AI területén, mivel amit nem te építesz, azt nem tudod formálni sem. Ha az EU csak szabályoz, de nincs saját frontier training DC kapacitása, modellje, mérhető létszámú felkészült szakembergárdája, akkor csak szabályelfogadó, és nem szabályalkotó lehet.
  • Az AI-ra is igaz a „kutyaharapást szőrével” bölcsessége: az AI offenzívát nem lehet megállítani, de a válasz nem a tiltás, hanem a védekezés felfuttatása ugyanazzal az eszközzel, azaz igenis EU szintű támogatásra van szükség a témakörben. (nincs is szebb annál, mint amikor egy rosszindulatú AI algoritmust egy jóindulatú kap el, azaz a automatizált védekezés kötelező, bár a támadónak helyzeti előnye van.)
  • A technológia változási sebessége valóban követhetetlen a szabályozók számára, ezért ne a technológiát kell részletesen szabályozni — az úgyis köröket ver rád, — hanem a kimenetet kell mérni és ezért felelőssé tenni az AI gyártóját. (itt van egy kérdőjelem a felhasználó felelősségéről, lásd a korábbi kés analógiát, de az egy másik post)
  • Az USA-Kína AI rivalizálásra érvényes, hogy bár mindketten a világhatalmi pozícióra törnek, és ennek érdekében elmennek a falig, ugyanakkor közös érdekük az AI feletti kontroll megtartása. A nukleáris fegyverek elburjánzása ill. annak szabályozása azt mutatja, hogy bár általános megállapodás nem valószínű, de szűken vett, a kölcsönös pusztulásra korlátozódó viszont igenis lehetséges. A kérdés persze a deklarációk mögötti tettek verifikálása lesz, ui. az AI esetében nem elég egy szeizmográf a sunyiban elkövetett atomrobbantások detektálásához.
  • a társadalmi szerződés felrúgásának folyamata és következményei igazak az AI nélkül is. Azaz nem AI-szabályozással kell kezelni, hanem a munkaerő átképzésével, és előbb-utóbb az elosztási mechanizmusok újragondolásával (lásd Universal Base Income).

 

The memoirs of Kilgore Traut: Please unload your gun before entering

 

pls_unload_your_gun.jpg

Az AI - karöltve a robotikával (test és lélek 2.0) - megállíthatatlan, miközben az EU pont erre tesz kísérletet azzal, hogy szénné regulázza. Ennek a megközelítésnek a várható következményeként fokozatosan veszíteni fog a versenyképességéből a másik két pólussal (USA és Kína) szemben. Így kevesebb jut majd a jóléti állam szolgáltatásaira (egészségügy és nyugdíj), ami társadalmi feszültségekhez vezet, mivel az EU (öregedő, ie. egyre drágábban szervizelhető) népessége ezeket a születési jogon járó juttatásnak tekinti. Ezek a feszültségek a populista szélsőség malmára hajtják a vizet és végső soron az EU széteséséhez vezethetnek. Az alábbi blog post a miérteket gyűjti össze és amellett érvel, hogy az AI-t és a robotikát támogatni kéne annak érdekében, hogy az EU versenyképes maradhasson (nota bene ne essen szét) az AI-humanoid robotika okozta átrendeződés után.

Néhány napja végig hallgattam egy banki AI Security konferenciát. Az előadások szűrlete egy mondatban: „óvatosan az AI használatával, különösen forráskód feltöltésével, mert azok jó eséllyel a rosszfiúk kezébe kerülnek majd, inkább vágjuk vissza, abból baj nem lehet.” Rá nem sokkal egy AI fejlesztés kapcsán a compliance szakértő egy több oldalas dokumentumban fejtette ki, hogy mi-mindennek kell megfelelnie a kérdéses alkalmazásnak még mielőtt egy sor kódot írtunk volna. (GDPR, AI Act, ISO 42001, DORA, ISO 27017/27018 stb.)  A fentiekről egy Mississipi beli pawn shop-ajtaján látott felirat jutott eszembe: “Please unload your gun before entering”. Elképzeltem, ahogy a rosszfiúk kitárazzák a fegyvereiket, tán még a símaszkot is begyömöszölik a tatyójukba, mondván, “akkor ma itt nincs rablás.” Az a gondom, hogy az EU-s szabály hegyek lelassítják az AI és a társ területek fejlődését, miközben a két nagy ellenpólus padlógázzal előz meg minket, nem beszélve a rosszfiúkról - mondjuk Phenjan-ban - akik nagy ívben ignorálják az egészet és okoznak majd egyre nagyobb kárt az AI felhasználásával.

Az erők, amelyek nem engedik, hogy az AI-t bárki kontroll alatt tartsa

  • Elindult a legújabb kori fegyverkezési verseny. Minden nagy eredményt megelőz egy trigger. Az USÁ-nak ilyen volt Pearl Harbor vagy később a Sputnik, amiből az atombomba ill. a Holdraszállás jött ki válaszul. Kínának ugyanezt jelentette az Alpha Go győzelme Lee Sedol ellen, de főleg egy évvel később Ke Jie (a világ akkori legerősebb Go játékosa) legyőzése szintén a Deep Mind AI megoldása által. A kínaiak felvették a kesztyűt. Ennek a versenynek egy jó indikátora a bejegyzett szabadalmi kérelmek számának alakulása.

    Apropó fegyverek: Az autonóm drónok, de főleg drón rajok igénylik az AI-t mind a cél azonosításhoz, mind az terepakadályok elkerüléséhez. A túloldalon ezen támadások elleni védelem szintén nem képzelhető el a hagyományos - emberi - döntéshozatali mechanizmusok mentén, egyszerűen nincs rá elég idő. Az orosz-ukrán háború egyik már ma is látható következménye a hagyományos hadviselési szabályok újraírása.

patent_applications.jpg

https://www.wipo.int/web-publications/world-intellectual-property-indicators-2025-highlights/en/patents-highlights.html (EPO: European Patent Office) 

  • A második hajtóerő a pénz. Esetünkben sok pénz, elég megnézni az Anthropic (965 milliárd USD) ill. az OpenAI (852 milliárd USD) IPO előtt értékelését pl. Magyarország tavalyi, kb. 250 milliárd USD-s GDP-jével szemben. A technológiai fejlődésben lévő üzleti lehetőség egy új aranylázat indított el. A fránya kapitalisták az AI vevői oldalán is dörzsölhetik a tenyerüket: saját - szubjektív - becslésem szerint egy fehérgalléros munkatárs kb. 20% hatékonyságnövekedést tud elérni egy jó és jól használt AI használatával. Ez – azonos output mellett – minden ötödik ember elküldését eredményezheti a kicsikét intellektuális, de repetitív munkakörökből pár éven belül. (Lásd még Geoffrey Hinton elhíresült karrier tanácsát.)

    A hab a tortán az, hogy az AI legnagyobb vevője maga az állam, hiszen a „Big Brother is watching you” megvalósítása szintén igényli az arcfelismerést és a valós idejű hang elemzést. Kínában 700 millió kamera jut 1.4 milliárd emberre, kombinálva egy társadalmi kredit (social scoring) rendszerrel. Orwell megemelné a kalapját… London nem sokkal kullog a fenti arányok mögött. Avagy: „mindenki belepisil az úszómedencébe, csak van, aki a trambulinról teszi ezt.”

  • A harmadik erő az ego. A “megcsináltam” és az “én csináltam meg” érzés mindent visz. A Szilícium-völgyben nem az ”élni és élni hagyni” a mottó, inkább a Highlander-t nézik. (There can be only one). Ezek az emberek győzni akarnak, bármi áron. Ide tartozik a megfigyelés, miszerint a tudománytörténetben még egyetlen esetről sem tudunk, amikor valamit, amit meg lehetett csinálni, ne valósítottak volna meg.

  • A tudósok igénye a tudásmegosztásra - a számítástechnika története példa arra, hogy az emberi alkotásvágy fantasztikus eredményekre képes. (lásd Zuse, Atanasoff, Neumann, Shannon...) Az akadémiai működés hajtóereje a publikálás ill. az ezen publikációkra való hivatkozások igénye, aminek következményeként az áttörést jelentő eredmények mindenki számára hozzáférhetővé válnak, nagyon rövid időn belül. A nemzetállamok ezt nem tudják és talán nem is akarják korlátozni.

  • A társadalomnak szüksége van az AI által támogatott fejlődésre – nem a Kurzweil-i szingularitásra gondolok, még csak nem is a fúziós reaktorral működő, űrben keringő kvantumgépes adatközpontokra, hanem prózaibb dolgokra, pl. a gyógyszeripari kutatások eredményére, amik majd le tudják győzni az antibiotikum rezisztens baktériumokat vagy az arcfelismeréssel bíró humanoid robotokra, amelyek bemennek egy égő házba, hogy kihozzanak egy sérültet.

Miért gond, ha nem lehet kontroll alatt tartani az AI-t?

  • Az AI nem neutrális: Évekig Werner von Braun megközelítését vallottam, ti. a tudománynak nincs morális dimenziója, olyan, mint egy kés, más hatást vált ki attól függően, hogy egy sebész, vagy egy gyilkos kezébe adod. Az AI nem követi ezt a mintát, politikai töltéssel bír, ui. az erőviszonyok átrendeződését fogja eredményezni, ami az állam és az állampolgár közötti hallgatólagos szerződés felrúgását vonja maga után. Az állam nem tudja teljesíteni a vállalásait (mert nincs rá elég pénze), az állampolgárok egyes csoportjai pedig nem akarják betartani a játékszabályokat (mert megtehetik). 

  • Az AI demokratizálni fogja a károkozás képességét: pl. a Wannacry-t továbbfejleszti egy algoritmus és legközelebb nem hagyja benne a forráskódban annak a regisztrálatlan domain-nek a nevét, amin keresztül meg lehetett állítani az enkriptálási folyamatot vagy akár hetente új verziót fejleszt ki és - az okozható kárhoz mérten gombokért - piacra dobja.

  • A technológiai innováció elérése korlátok között tartása egyre nehezebb: a nagy áttörések kontrollját korábban azok költsége jelentette. (a Manhattan terv költsége az USA akkori GDP-jének kb. 0,4%-át tette ki és az Apollo program is hasonló GDP arányos költséggel bírt 20 évvel később.) Magas volt a belépési küszöb. Mára ez megváltozott: a kulcs területek (AI - humanoid robotika – 3D nyomtatás és a genetikai kód manipuláció) költsége exponenciálisan esik, ezáltal egyre szélesebb kör számára válik elérhetővé, beleértve a károkozást célzó felhasználókat is. Egy sufni laborban a korábbiaknál hatékonyabb (gyorsabban terjedő és gyilkoló) vírust legózhat össze egy maroknyi rossz arc tudós, vagy csak szimplán félre megy valami, mint pl. abban a bizonyos wuhani laborban 2019 őszén.

  • Az AI választásokat dönthet el: Orosz barátaink évek óta minden demokratikus választást igyekeznek befolyásolni a hamis információk terjesztésével és ezt mostanra AI által generált kamu profilok ezrein keresztül tehetik meg. (ez olcsóbb, mint egy troll hadsereg) Ehhez persze AI által generált deep fake tartalmat is gyártanak, (mert ez nagyobbat szól). A kellően felhergelt választópolgár aztán a megfelelő helyre biggyeszti az X-et… (lásd pl. Brexit, pedig az még „csak” data science volt Facebook frontend-el.)

  • A kulcs technológiai felfedezések blokkolása káros: ti. az állam meggyengüléséhez vezet. A Gutenberg féle Biblia 1455-ben jelent meg. Az Ottomán birodalom 1727-ig tiltotta az arab karakterekkel történő nyomtatást. Ezzel majd 300 évig meggátolta a tudományos eredmények széles körben való elterjedését. Bár sokkal rövidebb ideig - és kisebb kárt okozva – de ide sorolható pl. az angol Locomotive Act is (aka. Red Flag Act), amivel az angol parlamentnek (a lovaskocsis lobbynak) sikerült húsz évnyi versenyelőnyt adni a német autógyártásnak.

  • Az AI fejlődése gyorsabb, mint amit a szabályzók le tudnak követni. Edward Wilson megfigyelése: "Kőkorszaki emócióink, középkori intézményrendszerünk és isteni technológiánk van.”  soha nem volt még ennyire időszerű. A fejlődés exponenciálisan gyorsul, a jogalkotók nem ehhez a sebességhez szoktak, ők években, sőt évtizedekben gondolkodnak. (btw:  AI Act-je még nincs az USÁ-nak, de így legalább nem évülhet el.)

Adott tehát a 22-es csapdája: az AI-t korlátok közé szorítani nem lehetséges, ugyanakkor erre nagy szükség lenne. Abban biztos vagyok, hogy a féloldalas – csak a szabályozásra törekvő - megközelítéssel lábon lövi magát az EU. Gyalogos katonaként annyit tudok, hogy az EU-n belül nincs egyetlen chip gyártónk, op. rendszer gyártónk, felhő szolgáltatónk és AI szolgáltatónk sem, aki labdába rúghat világszinten. Az EU zálogháznak  mindössze egy táblára futotta az ajtón. Ez kevés lesz a rablók távoltartásához. Pár briliáns könyvben találtam megoldási javaslatokat ((pl. Mustafa Suleyman – The coming wave és Yuval Noah Harari - Nexus), ugyanakkor van két dilemmám.

  • Kb. úgy vagyok az AI-al, mint a Limitless c. film főszereplője, aki véletlenül hozzájut az NZT-48 nevű kísérleti pirulához. Az NZT az agy teljes kapacitását megnyitja — villámgyors tanulási képességet és páratlan összefüggéslátást — adva a főhősnek. Ezzel a cuccal a kivénhedt csatalovak újra hadra foghatóvá válhatnak (Hallelujah), csak a mellékhatásokkal kéne kezdeni valamit.

  • Az Inconvenient Truth c. film 20(!) éve már jól összegezte a klímaváltozás várható hatásait. Ugyanaz az USA, ahol ez a film készült, kilépett a párizsi szerződésből, ami a CO2 kibocsátás csökkentésére vonatkozó lépéseket tartalmazta. Az EU két kulcs riválisa fontosabbnak tartja a hatalmi kérdést, mint a klímaváltozás problémáját. Miért lenne ez másképp az AI szabályozása kapcsán? 

Ha van véleményed, kérlek ne tartsd magadban!

üdv Laci

Források:

 

 

 

Paulus in reverse gear

laszlok_ink_drawing_in_the_style_of_albrect_durer_depicting_a_m_6dee3ce6-9052-4f3a-a60c-e8ebbea90caa.png

Christmas was great because I had some time to read/listen (see the inputs below). I distilled the outcome into the following post. Something fundamental has changed with an impact on the cloud therefore I have to make amends to my previous thinking.

Background on me: I was so thrilled by the fall of the Berlin Wall that went to Berlin on my own money to celebrate its 20th anniversary of “Mauerfall” in 2009. I stood in the rain, watched the toppled dominos and listened to Lech Walesa and Miklós Németh. I saw this wall in the mid-80s as a symbol of being on the wrong side, so I was deeply moved on that November day. I believed in Europe as a concept and thought that the 20th century – when Europe, and Hungary in it screwed up so badly - was behind us.

I also believed in another concept: I have been a cloud advocate since 2012 and proud of being part of a cloud transformation at a local commercial bank. When I was asked to produce an exit plan for the new cloud platform, I wrote this: The more we use PaaS and SaaS instead of recreating traditional on prem IaaS, the less likely it will become that we will ever come out of it, unless we want to spend as much on getting out as we spent on getting in, that would kill the business case. There is no affordable exit from the PaaS (let alone SaaS) and there is no material need for it. I added that not using PaaS services (being prepared to leave fast and cheap) would be like taxiing on the tarmac with an aircraft but never taking off. I was convinced that the only real reason to exit was a geopolitical meltdown. I finished my case with pointing out that in case of a shitstorm Microsoft, Amazon and Google might be forced to suspend services under extraordinary geopolitical circumstances, quite likely in a coordinated manner. (on the same day). This whole thing seemed very unlikely. What I missed at that time that an exit does not have to be economically rational but might be strategically necessary.

A year later came the attack on Ukraine by Putin, then came Trump (term 2) letting down first the Ukrainians, then Europe (for the record: Europe was indeed a free rider for decades, piggybacking on the US for its defence, keeping its welfare states afloat by underspending on their military.). Then came the news about Venezuela, an “invitation to waltz” for any power who want to turn the table by force. Perhaps the geopolitical meltdown is no longer an unimaginable event.

Call it a confirmation bias but I interpreted the Draghi report last year as a proof for my case, that is the EU was inhibiting innovation with its overwhelming regulations, and this should be trimmed to improve competitiveness. I knew for sure that the EU missed the boat in several key areas, like chip design and production, public cloud let alone AI. We cannot even show up an EU Linux distro with a double-digit market share. (Ubuntu is from the UK...)

So here is the Catch 22 of Europe: US political volatility makes full reliance dangerous, while it lacks viable near-term alternatives and diverting from hyperscalers would lock it in stagnation.

The current leadership of the USA acts like a loose cannon, destroying partnerships that took decades to build. If we add that the US society is divided to the point that the foundation of liberal democracy may not be mended, let alone that social media on AI steroids makes it easy to hack an election, we might conclude that it is risky to place all bets on US technologies.

On the other hand, name the most important breakthroughs in the last 80 years of information technology and count the number of key innovations that came from Europe. If you go one step further: this is possible that the stagnation of the EU economies (combined with the aging and shrinking populations, plus the need to spend more on the military) will evolve into social instability as soon as the welfare state becomes unsustainable in societies in denial phase. (The fact that in Europe there are only small countries and countries who are yet to recognise that they are small is another story.) The bad news: IT innovation became very expensive lately; free competition turned into “techno feudalism” (copyright Varoufakis) with a handful of giants taking rents from anybody with a smart phone and internet access (roughly half of the world’s population) and here we are (that is the EU) with our pants down in the coming storm. We do not have the funds needed to catch up. We have no replacement options to US technology in the short term.

If we stay on US technology, we run a risk of being screwed, if we divert from it, we surely lose any chance to increase our productivity, therefore stay in the race with the US and China. The sovereign clouds are just band aids without an overhaul of the EU. Europe must invest in technology, act on the recommendations in the Draghi report (reduce regulations, build a unified capital market and focus on disrupting innovation instead of protecting what we already have) and it must swallow the bitter pill and re-prioritise promises on welfare services that were built during a period of growth and security that no longer exists. And we must do all of these pretty fast. In the meantime, as a bare minimum, we have to keep a copy of our own data on prem, encrypted with keys produced by us.

As always, I would be delighted to get your feedback on these thoughts.

Inputs:

  • Technofeudalism: What Killed Capitalism (by Yanis Varoufakis - 2024)
  • Kaput: The End of the German Miracle (by Wolfgang Münchau - 2024)
  • The Draghi report on EU competitiveness (2024)
  • Freedom - Memoirs 1954 – 2021 (by Angela Merkel - 2024)
  • Postwar: A History of Europe Since 1945 (by Tony Judt - 2006)

Related blog posts:

Redmond, we have a problem

 redmond_we_have_a_problem.jpg

It seems that the law of supply and demand doesn’t work the usual way in the cloud related job market: It creates behavioural distortions rather than the gradual move to a healthy equilibrium. While key players seemingly declared victory and shifted their sight to the next battlefield of AI, this anomaly, combined with the heightened scrutiny from the regulators might hurt adoption in the long run. This post aims to identify the root cause and to suggest possible ways out of this problem.

Symptoms - What’s going on here?

I have been tinkering with cloud implementations for several years. It baffled me that – despite of the cloud-dev(sec)ops engineer compensation being 30+ % above the average - the inflow of talent into this area is far lower than anticipated.

The salary structure for an individual contributor DevSecOps Engineer varies based on geolocation, level of experience, and company size. Below is a table outlining the approximate salary ranges for different levels in various regions:

pay_ranges.jpg

  • For the above reason my team lost 10+ top notch cloud/devops engineers in two years. Most of these folks went abroad, one of them as far as Vancouver. Some others stayed in their homes but switched to foreign employers.
  • Some contractors sold 80% of their time twice, to two different customers, one of them was so unashamed that he put his other job in his Linkedin profile. (There are telltale signs of this behaviour: insisting on full home office and missing regular meetings, later deadlines.) Some elevated this practice to the company level…
  • Some others played a fair game and told upfront that they work for multiple clients, carried 3 (!) notebooks (one for each client plus one for their own company, God bless virtual desktops…) and declared that they would not even pick up the phone on days assigned to their other clients.
  • The cloud IT market bears resemblance to the construction industry, two, sometimes three layers of subcontractors adding little value besides their margin to the price tag.

The root cause

The wheel reinvented: this is the imbalance between supply and demand. The thing that bugged me was that despite of knowing the impressive earning potential, less than one percent of the internal IT Operations workforce (in a HUN commercial bank) made a substantial effort to learn the new discipline. (Those who did soon left ITOps.) On the other side of the house few developers made a career shift to become DevOps/IaC experts.

Gartner found a good demonstration of the problem in 2021. They dubbed it IT Talent quadrant. This matrix uses the stack ranked demand (the number of job postings asking for a given skill) vs. the number of these job openings per candidate as dimensions and provides evidence that Kubernetes, Infrastructure as a Code and Automation are critical ingredients for any cloud implementation. For some reason Gartner did not update this chart since 2021.

it_talent_quadrant.jpg

I created my own explanation, the Commitment matrix, that uses the seller’s commitment to his/her employer vs. the buyer’s commitment to its employee as dimensions. (In some cases, the seller and the employee being the same.)

commitment_matrix.jpg

In most cases there is a gradual shift of any new skill from being a Spice to a Cornerstone and later to drift into the Majority. For some reason in case of the most wanted cloud expertise this shift is just not happening.

The reasons

  • the cloud is an expanding universe, more and more large companies make their inroads, thus generating new demand for experienced people.
  • Buyers would love to have Spice people on their staff, but are not willing to pay the requested premium, claiming that it would generate internal salary tensions. (or simply drawing the comp. ceiling too low.) On the other hand, the very same corporations are willing to pay twice as much for the same people as contractors.
  • Top engineers do not want to work for a large firm as rank and file, they pledge allegiance to their boutique consulting firms instead. Smaller size means a more direct connection of the person’ contribution to the performance of the firm, thus results in perks up to partial ownership, let alone being among great technical peers is a nirvana for an engineer.
  • Achieving Spice level requires extensive learning and practice. Let alone the industry dictates a breakneck speed: your knowledge will become obsolete within 4-5 years unless you keep updating it. Cloud DevOps and Security are good examples for the Pi shaped skillset. One needs to understand the traditional development principles like branching or a pull request (and must write decent code) while being familiar with the nitty-gritty of name resolution in a hybrid environment with private endpoints or how a policy set will interact with the underlying Terraform codebase.
  • The last item could come from etymology: Dev + Ops is like mixing oil with water. Development is akin to creating something new, thus experimenting with the unknown: little predictability with high level of autonomy. Operations on the other side hinge on high level of predictability and minimal autonomy. I already used the modified Wardley map to depict this divide, but it is worth repeating it.

modified_wardley_map.jpg

To make things worse top-notch developers disregard script languages and look down on the non-functional side of the house like a private DNS resolver or a cross-regional site recovery. My hunch: they do not care, let alone know much about these things and want it as a service. From time to time, I present on universities as guest lecturer. On one occasion I asked the participants (50+ BSc students in their graduation year) about the power consumption of an Intel server. No idea. How about a notebook: no clue. A hair dryer? One girl new it. Infrastructure is not sexy, not even when it becomes code.

Ways to handle this problem

There are multiple stakeholders in this game with multiple paths to follow.

Vendors - Reduce complexity

I picked Kubernetes as the veterinarian’s horse to illustrate the problem. People who dealt with Kubernetes and its automated deployment and configuration can attest that it is complex to implement and to run, even without its ingress headaches with private endpoints or a service mesh on top of it. For this reason, Microsoft has offerings like Azure Container Apps (ACA) or lightweight alternatives like Azure Container Instance (ACI) while allowing plain vanilla implementations on VM scale sets for masochists.

The downside is that this simplification comes with losing some of the configuration, security and monitoring capabilities. As a dreamer I wish we had a universal serverless compute resource on our hands like Azure Functions or Amazon Lambda. „Liberté, Égalité, Fraternité” for cloud computing: „Autoscaling, Resiliency and Security”.

Another approach is to hide the internal complexity altogether by moving to PaaS and in many cases to SaaS. This is exactly what Microsoft is doing eg. with items bundled into Fabric. The issue: the deeper you walk into the cloud forest (wandering to SaaS territory) the less likely you will ever come out. This reduction of complexity is not evil by definition, one could argue that it helps IT to create business value faster. But there is a catch: Once a senior executive of a large bank asked me what the biggest danger in cloud computing was. My answer was: if politicians on either side of the Atlantic go crazy. Two years ago, I meant it as a joke…

Service providers - Hide complexity

Complexity and skills shortage provide a business opportunity. In practice it means creating a layer between the offerings provided by the hyper scalers and their enterprise customers. This toolbox is a combo of blueprints for landing zones, IaC code base for cloud services, integration solutions for connecting the cloud instance with its on prem counterpart covering networking, identity management, service management and monitoring and automatically deployed policy sets to streamline compliance audits.

hide_complexity.jpg

While it has its short-term financial advantages to start each implementation from scratch (if you are selling this service), only the thin upper layer of customers can afford it.

Warning: your Spice people are your golden goose, and not just for the profit you make on their billed hours. Ignoring the need for or screwing up with the foundations will lead to flawed implementations that will haunt you either as a security breach or a hard to run environment that your client will hate.

 

Engineers - Thrive on complexity

The revolution in infrastructure platform arena (Software Defined Storage-Network-Compute, Infrastructure as a Code) is the marriage of two – earlier distinct disciplines. This is reflected by the compensation data for cloud architects and DevSecOps engineers on Glassdoor: this is in the 130k to 230k USD gross annual range in the US. The rule of thumb is that whatever a top-notch IT skill costs in NY or London, you will get the same for one third of this price in Budapest, voila, flourishing Shared Service Centre business. So, we are talking about 50-60k USD annual gross for a good cloud devops engineer or a cloud security expert. The emphasis is on good. There is never-ending debate about the relevance of certificates. I recall a top-notch colleague at Microsoft from last century, when I nudged him about his certs (the lack of them) and offered that I would cover the cost of any MCP exam. He literally threw his MCSD certificate on my desk in two weeks. (he left the country 10+ years ago…) If you are good, the certs are doable and a good advertisement, but true, certs themselves are not enough. So Folks, learn and experiment! Cost is not an obstacle, a Coursera (ex. acloudguru) subscription is 30 USD a month, time is the problem.

For the record: it is not all roses, as shown by the chart below.  (I found similar data for Hungary.) The IT job market is not that pretty as it used to be, but it this fact reinforces my previous mantra: learn and experiment to stay ahead of your competition.

job_postings.jpg

Another caveat is the industry and the location. Your compensation depends on the impact of your work on the outcome and the profitability of the sector you are operating in. The effect is a bit sad: healthcare and education could make a good use of top-notch IT if they could afford it.

profitability_vs_demand.jpg

Legend has it that when the famous bank robber John Dillinger was asked by a reporter why he always robbed banks, he replied matter-of-factly, “Because that's where the money is!” In the next chapter we will have a look at the second core problem with the cloud: hyper scale providers being greedy and siphoning out profit from the value chain.

As always, I appreciate your feedback.

 

Sources

 

Horseshoe bend #6: Galileo Galilei

galilei.jpg

This post is an attempt to identify the root cause of the apparent divide between the two major branches of IT and to offer a remedy to this problem. (ambitious, isn't it?) As always this coin has two sides, so I would like to learn the view of the Dev folks and IT Operations folks as well.

Contradictions

During the 2+ years of running the cloud transformation at a commercial bank I faced contradicting views on the following aspects of how IT could function.

contradictions.jpg

  • One extreme argued that the cloud is just another data centre, therefore it should be treated the same way as our own: same (ticket based) processes, same technologies (ie. nothing else beyond what we already have on prem) and most importantly same speed letting new things in.
  • The other extreme exclaimed that the cloud shall change most aspects of IT as we know it, we should replace the stop signs (approvals) with guardrails (policies), automate every aspect of our daily life and most importantly treat IT infrastructure as a product that we want to sell to our (internal) clients.

I recall when 28 years ago – being in charge of introducing Exchange 4.0 in a local commercial bank -  I attempted to explain to a deputy general manager that printing and faxing each and every e-mail he sent (in order to make sure the other party received it) was suboptimal and a read receipt was enough. (not kidding). I cared more for the consulting revenue than the rain forests, so I dropped the case. In the same bank I had arguments with the network folks that tracking IP addresses for every Windows desktop in a paper-based grid notebook is suboptimal compared to DHCP. I did not drop this one, gaining friends until they ran into issues due to duplicated IP addresses and the joy of troubleshooting them.  (then they relented…) It has been bugging me ever since why it took them so long to realize these things. Why is it so darn hard to embrace change?

The psychology of IT Operations

 

psychology_of_it_ops.jpg

One day a manager at IT Operations asked me how many times I recall when IT Ops was praised by the senior leadership for things going normal (rarely) vs. how many cases I remember when they were reprimanded after a major service outage. To be honest: for all of them. I had to agree with his point: IT Operations is strongly incited NOT to change what works since the bulk of issues are connected to changing some aspect of the service. Hence the need for a CAB (Change Advisory Board) in ITIL. The root cause for pushback against change is the deep belief that speed and stability are the opposites in the same dimension.

At this point I have to borrow a page from the book of Matthias Patzak, who in turn borrowed a page from Simon Wardley and tweaked his map by changing the vertical axis (visibility) to autonomy. Here is a modified Wardley map explaining why change agents are at odds with IT Operations. (a proposed remediation is on the chart)

wardley_map.jpg

The question is unavoidable: How can the infrastructure stay unchanged when everything that uses it changes at an unprecedented speed? My hunch: it cannot. The rest of this post is an attempt to prove this point.

The stakeholders’ view

The voice of the customer – In our case the app dev teams:

  • Putting the cognitive load on the customer of the service is a guaranteed customer satisfaction killer – when a developer needs to figure out the internal processes of the service provider. (eg. filing separate ServiceNow tickets for the VM, the OS, the RDBMS, the DNS entry, the domain join and the admin access) It is like Vogon poetry. (the 3rd worst in the Universe) Dissatisfaction is the hotbed of shadow IT. For the record, not just in IT: Ferruccio Lamborghini probably would have stayed with his Ferrari (and his tractor business) if Enzo Ferrari would have been a bit nicer to him or would have made better clutches.
  • Lack of speed and autonomy leads to disengagement. I recall a developer who wanted to test a new feature of MS SQL Server. It took him 3+ months to get a test server. By this time, he gave up on the whole idea he wanted to test in the first place. (He knew that the test bed he was asking for would have taken about an hour to implement if he was given a chance. But he wasn’t.) So, after 3 months he dropped the whole thing.

The voice of the business

  • The top management of companies are concerned about unforeseen changes that may have a devastating impact on the livelihood of their enterprise. Their worries are backed by data. The Corporate Longevity Forecast, eg. the time a company spends on the Standard & Poor 500 list is shrinking. In plain English even large established companies can disappear from the list or even become “also run” within a few years. (Nokia, Credit Swiss, GE, Qualcomm bidding for Intel, WTF?) The age of creative destruction is upon us: What worked in the past for decades may not be good enough in the next ten years.
  • Enterprises are trying to be prepared for and respond quickly to attacks from any new force in the market. The cloud is one of their bets. All parties but one agree on the following:
    a cloud transformation will deliver its value proposition only if the organization and the underlying processes are changed along with the technology.

When money talks - R&D budgets

  • If we assume that most Technology companies spend the same portion of their revenue on R&D and this R&D has the same impact on the bottom line (sometimes not true) than we may predict that more R&D (when it leads to a breakthrough), results in a quantum leap in profitability.
  • If a firm catches one of these quantum leaps in a life time, it is lucky. If it catches two, this has long lasting consequences for the entire industry. (Data points from 2023: IBM made 8.18 billion USD net income, in the same period HPE made 2 billion, Microsoft 86 billion.) The cloud race is over, the AI race has begun and the hyperscalers have more money to spend on it than their traditional competitors.

statista_stats.jpg

Source: STATISTA.com (data for HPE is from 2015 only, when they separated from HPQ)

I feel it in my fingers, I feel it in my toes (change is all around you…)

The following chart is a visualization for obtaining infrastructure for an app. Say the dev team working on App 1 wants an infrastructure with an application server with some compute power, an SQL DB, and OS and a VM underneath, plus this thing should be accessible via the web to clients. In a traditional org this would mean 5 separate ServiceNow tickets with manual handover between them. Eg. The virtualization folks would set their ticket status to done, a human being would intercept this change, and would file another ticket to the OS team to install the OS. These teams are measured on meeting their SLA-s, so they would close the ticket even if the client is not able to log on to this server. (After all identity management is a separate step, right?) Imagine a car dealer who tries to sell you an engine, a transmission, a few wheels and a body work as separate items, when you wanted a car…

the_tale_of_5_snow_tickets.jpg

In a cloud infrastructure it is a set of IaC scripts that ran at once.  And here comes the problem:

  • This automation could be built by a dedicated cloud group requiring an org change that is against the will of the existing org units. Injecting the SNOW tickets into the belly of the automation – with the same 5 day SLA-s - would require the same time as the traditional setup. If you are against the cloud all you need to do is to insist on sticking to the old process.
  • You can grant the right to execute this automation to the developer teams themselves, but it would mean relinquishing control and shifting to creating and maintaining the automation scripts and establishing guardrails (policies) instead of the stop signs.
  • Creating and maintaining IaC code, CI/CD pipelines and policies (some people might call it DevSecOps and Site Reliability Eng.) require new skills and could be seen as a threat for those not interested in the above changes.  

All in all, an innocent technology change proposed by the cloud would require organizational, procedural and skillset changes in an org who does not like change.

There is an interesting observation in the State of DevOps report for 2023. The more frequently you make changes, the more likely you will succeed. The root cause is simple: more frequent small changes (with a working rollback) touch fewer things that can go wrong. If we turn it around: the more worried you are about changing the platform, the more time will pass between changes, gathering more moving parts, that in turn will increase the likelihood that something will indeed go wrong.

number_of_changes.jpg

A side effect is that this will make your environment less secure (I will not apply that security hot fix because it might break the application – to be honest, sometimes it will) and will accumulate more technical debt.

There is an expression that is the tell-tale sign of a siloed organisation: “he is criss-crossing in my backyard”, read trespassing into a territory that the speaker considers his home turf.  “Any time you start something new like [an innovation – eg. the cloud initiative], that cuts across many areas, there’s a potential for people feeling like you’re in their backyard.” (Michael Britt) The problem is that most value creation process involves multiple departments, therefore one cannot innovate without “trespassing”.

I got into a conversation with the cloud transformation lead of a large commercial bank a few days ago. He made an observation that struck a chord: only a miniscule portion of the IT Operations workforce (in this bank) embraced the cloud, they honestly believed that everything was okay and this cloud thingy was unnecessary, so responded accordingly. I think Amara's law is at work here: "We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run." I am biased in this case, but I believe they underestimate the impact of the cloud and miss the opportunity to increase their market value.

Squaring the circle - the way forward

  • The known knowns:
    - the business hates when cost grows faster than revenue. READ: The days of extensive growth in IT staff are over. (if you care for the whys, check out "Red Plenty") There is one way forward: automation
    - what is likely that those willing to merry stability with speed will gain the upper hand vs. those who will stick to their guns and obstruct change.
  • The known unknowns: 
    - technology will create as many jobs as it will eliminate. (a recent study by Guardian suggests that it creates more than it destroys.) What is unclear which jobs will stay and which will transform to something new. My bet is that the mundane ones (repetitive ticket crunching) will fade, while those requiring more thinking (eg. designing those guardrails mentioned above) will grow their relevance.
    -large IT shops carry an enormous amount of legacy, applications that generate the vast majority of business value for the enterprise today. This is difficult to forecast when the above shift will happen and how long this shift will take.
  • The unknown unknowns: 
    - IT Operations can hold the business at gunpoint claiming that any org/process change will pose a threat to the current stability of the business, therefore any cloud adoption should happen on their terms and at a speed deemed suitable by them. The real unknown is how long IT Ops can resist the push from their own internal clients and the hyperscalers. (make no mistake: the stick will follow the carrot soon.)
    - for the record: while industry disruptors are already doing it, my prognosis that technology allows for speed while maintaining stability is not yet proven in large enterprises carrying a legacy. 

Famous last words: In 1633 Galileo Galilei had an unpleasant encounter with the Sacred Inquisition that forced him to recant his claims that the Earth moves around the Sun, rather than the other way around. After leaving the courtroom he murmured "Eppur si muove" ("and yet it moves") and spent the rest of his life in a house arrest.

As always, I will be glad to learn about your feedback.

Sources:

 

The memoirs of Kilgore Trout nr. 6: the elephant and the snake

Twelve years ago, I held a presentation at the Budapest University of Economics. I used a drawing from the Little Prince (when the boa constrictor swallowed the elephant) with slight modifications to illustrate the income over time curve. Warning: your government wants to keep you as a net contributor in the pension system, while you may want to have a few more good years. In 2012 it seemed like a funny thing, today it looks like a problem. Considering the likelihood that your pension will cause a significant drop in your standard of living, your goal is to push the blue milestone to the right while retaining (some of) your market value.

the_elephant_and_the_snake.jpg

The acceptance of the above curve depends on your age and financial status, but the first reaction usually is that this is wrong, “torque overcomes RPM”, experience rules etc, bottom line: the market is wrong. If you keep in mind “the customer is always right”, then you might become interested in the root causes of this devaluation and what we can do about them. If you are under 40, stop reading, if you are over 50, you might want to read on.

your_market_value.jpg

The components of your (job) market value

  1. Your experience – which doctor will you pick for a heart surgery for you kid? A newbie (who is eager to do it) or the 40+ year old guy with 15+ years of proven track record? The untold part of the story is that you do not want a 70 years old dude with a trembling hand to do this operation either. Ok, we are talking about IT, but keep in mind, Oppenheimer was 39 when he joined the Manhattan project… The problem double fold: your experience is amortized AND you are unwilling to let it go to make room for new skills and new experiences. You need to learn new things and learning gets harder as you get older. There is a potential escape route here: move to areas where the half life of your skills is longer, that is away from hard core IT towards something softer like process or project management or farming watermelons. The issue is that this area already got overpopulated with the refuges bringing the prices down. Another way is to move up in the hierarchy but it comes with the unavoidable and undesirable jostling for positions. (then hustling the pretenders….)
  2. Your network – to be precise a few key people in that network who act like your sponsor are vital to your career. These are the people who trust you, who put a bet on you and who will speak up for you in that vital moment when a decision is made about you (or not you). Side note: this is one of those things when size does not matter, quality does. And now the bad news: Like it or not, your network ages with you, that means that those who know what you are capable of might no longer be in the position to stand up for you.
  3. Your college degrees – I was a diploma collector once (3 university degrees). Then one day I asked myself when the last time was when I used a Fourier transformation or whether I could still use my coding skills in Z80 assembly. Diplomas in technology get amortized fast. The real value from those years is your capability to learn and the seeds of your network.
  4. Your language skills – whenever I meet an IT person who claims that not speaking English is okay, I lose my marbles. 90+ % of literature in information technology is in English… Bad news: the upper 25% of the new generation speak two languages before entering college. (The only area where I put a heavy demand on my kids was a high-level language cert in ENG and GER by the end of their high school. Okay, I also put some emphasis on math…)
  5. Your appetite for 60+ hours work weeks – being a workaholic is not a shame (been there, done that), albeit it will have consequences on your relationships with your loved ones. As the adage goes the only people who will remember that you worked that much will be your kids, not your boss. For sure this appetite will calm down a bit around 60.
  6. Your ability to learn and to forget – most folks accept the fact that the half-life of any technology related skill is around 10 years this means you will have to reinvent yourself at least 3 times during your active years. What many folks do not think about is that one has to “unlearn” the old ways of doing things in order to be able to absorb new things.
  7. The logical multipliers:
    • Your appetite for power – you cannot be a leader without starving for the right to make decisions. You will not be a great leader if all you care for is power and not your people.
    • Your health – although I accept the gene lottery idea, I think there are a few basic rules you need to play by: very little alcohol, no smoking, no drugs, enough sleep, lots of physical exercise and a wonderful woman (man) by your side.

Bottom line: to a large extent the market is right about reducing the market value of people over 50-55. On the other hand, they are wrong about rejecting old folks upfront without any consideration. I recall a disaster at Liptovský Mikuláš in Slovakia when a storm literally erased an entire forest in 2004 due to one thing: all trees in that forest were the same type, planted at the same time. Old trees are a must in any forest.  (pic below is my own)

liptovsky_mikulas.jpg

The cost side of the house

Homo Economicus beware I dropped minor things like inflation and mortgages, but I considered items like moving to a smaller home once you became an empty nester and inserted luxury items like a costly divorce into the mix.

the_cost_side_of_the_house.jpg

Houston, we have a problem: This curve does not look like a snake who swallowed an elephant.

What to do about this problem?

There is a gap between the income and the cost curve. If we accept the definition of happiness as minimizing the gap between one’s desires and one’s reality we have three choices:

  1. lower the bar of your desires and expectations
  2. stay on the job market longer and reduce the degradation of your market value
  3. increase the portion of your income from your savings

Option A is not that bad as it sounds. I have first hand experience about moving from a 6-cylinder BMW to a 3-cylinder Mini Cooper without any mental or manhood degradation. Fancy objects (cars, watches, gadgets etc.)  are not essential to your happiness, collecting excessive amount of them even suggests that you are compensating for something.

Option C is by far the best. The only caveat is that only a minority of the working population reaches “escape velocity” who do charity work only to save baby seals and rainforests. (besides being angel investors since they want even more money) OK, what about the rest?

salary_vs_return_on_investment.jpgSo here we are: the market is mostly right and becoming a follower of Siddhartha solves only a part of the problem. Here are the ingredients for preserving your livelihood over 55:

  • Drop anything superfluous from your life and use what you already have. This whole life thingy looks like a lease with an expiry date, ie. you will have to hand in all your belongings before leaving the stage.
  • Stop being concerned with everything. As Mark Manson put it: "Maturity is what happens when one learns to only give a f**k about what's truly f**kworthy." A subtler explanation is from Milan Kundera who described it as a choice about the number of mirrors you want to see yourself in. Accept yourself as is, minimize your social media activities and pick only a few people whose opinion you care about. The rest can go and fly a kite.
  • The final thing from my all-time favorite, the mother of COBOL, Grace Hopper: The most damaging phrase in the language is: “it’s always been done that way.”
    DO NOT continue doing things because this is how you did it in the past. Change in IT is inevitable let alone exponential. You need to adopt. It is like a winding road with curves where you need to change speed and direction to stay on it. 

long_and_winding_road.jpg

As always, I will be happy to hear your feedback and remarks. Happy riding, Folks! Laszlo

Horseshoe bend #5: Lessons learned so far

The following post is an attempt to summarize the learnings from our cloud journey in the first 18 months. You bet, this is biased, but it might help others who come behind us. Those ahead of us you may put your all-knowing smile on.

the_rocky_road_to_dublin_v2.JPG

How to go faster - the first steps in the chaos

Public cloud adoption is an intertwine of grassroot experimentation, the mandate from the senior management to establish an enterprise grade cloud presence and finally a crash landing of the first cloud workloads without a proper foundation. The sooner you have a program established around it, the less chaotic the first months will be.

You need a cloud strategy

that answers questions like:

  • why you want the whole thing in the first place, how and when do you declare that you reached this goal and what metrics are used to prove it. (eg. cost saving may not be a strategic goal, while speed is.)
  • what your core design choices are: cloud architectural design (eg. hub & spoke vs. VWAN), accepted building blocks (cloud services), CI/CD tool set (source and artifact repo, build and deploy tools), IT Sec key decisions (eg. rejecting the use of public IP, checking ingress code from the internet, policy layers, IaC framework and the toolset like Terraform vs. the cloud provider’s native tooling like Bicep) and most importantly a decision-making process how to reach these choices.
  • the question of ownership: Cloud is much more than a 3rd datacenter (in fact more than any other IT infrastructure), therefore its governance should be established in the context of Business IT, DevOps, IT security and IT Operations. This is not an ITOps internal affair.
  • The willingness to change everything: I could not find the source of this quote but I think this is true: “When digital transformation is done right, it's like a caterpillar turning into a butterfly, but when done wrong, all you have is a really fast caterpillar.” You have to change the processes and the org structure if you want to harvest the advantages of the cloud. Without these changes the result will be as slow as the original on prem counterpart is.
  • The right level of ITSec control – if too loose, you will be hacked, if too tight, nobody will use your stuff and shadow IT orgs will sprout out everywhere. You need to decide on a few core items:
    • single CSP, or multi cloud, distributed cloud yes/no, cloud native tools vs 3rd party for monitoring, managing, protecting it.
    • how far you are able (willing) to go with automation, mostly with Infrastructure as a Code (IaC). The dilemma is where to stop. The Pareto principle should give us guidance but it misses one key point: any manual intervention will defeat the purpose of the entire automation. This quote is from 1935, but it is as relevant as ever: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” /Upton Sinclair/
    • what your cloud operating model is: the conservative approach is when the dev teams file a SNOW ticket for everything in the cloud just like on prem, the avant-garde approach is when you give them freedom to implement their preferred PaaS component with their own IaC code and to go YBIYRI (you build it, you run it) for components that are not yet supported by central IT Ops.

Establishing the Cloud CoE

  • A program or an org unit: Management needs to find out if you are a project or an org unit. All peer connects (interviews with other enterprises who embarked on this journey earlier) show that introducing the public cloud at enterprise scale is a 5+ year program with likely evergreen residuals. Treating it as a project has implications, eg. 90+ % of the team will leave at the end of the program, taking all learnings with them.
  • Staffing:
    • #1: quick learners with a solid technology background are on high demand. Giving scraps of time of mediocre performers will defeat the purpose of the whole thing.
    • #2: the imbalance between supply and demand will crank up the prices to the point that can jeopardize the financial viability of the program.
    • #3: be prepared to lose your best cloud engineers to abroad. Our regretted attrition way over the internal FTE attrition. The replacement takes cca 3+ months. The ramp up will require another 3 months, ie. you are down with a top engineer for 6+ months.
    • #4: We underestimated, therefore understaffed the process, governance and compliance tasks. Cloud is not only an engineering task, but a heavy lifting on process and compliance, let alone a major change management undertaking as well. The non- engineering activities are 30+% of the job. (the process folks claim this is 50+%...)

Key decisions to make

  • what the public cloud actually is – a 3rd data center or something completely different? The CCoE was convinced that this is different while ITOps insisted that this was just another DC, therefore should behave like one: same technologies, same processes, and nothing else.
  • how far you want to go with self-service? One approach is to allow You Build It- You Run It where ITOps is not ready to operate the new technology. The advantage is that it will allow the dev teams to go faster but will require to build operations skills and capacity on their side. Another approach is to channel every cloud request into the existing processes and handle them as if they were an on prem request.
  • Some dev teams will want to tinker with PaaS components while others will want to concentrate on business logic and application-level tasks. In the latter case, centrally provided cloud services will be required for those who do not want to deal with the PaaS component operations. You need to define the boundaries between YBIYRI and these central cloud services (roles and responsibilities) AND need to establish this managed service layer. (this is mostly not a technical undertaking.) 
  • Drinking from the firehose - the balance between an R&D workshop and a factory - the number of PaaS services vs. the available offerings (let alone the Marketplace) Do not go beyond 10-15% of the total service offerings, otherwise you will be quashed by their quantity.

The forces that will slow you down

There are two forces at play here: ITSec and ITOps. (Compliance waiting for you around the corner.)

itsec_and_itops_1.JPG

  • On prem ITOps mindset will dictate that anything in the cloud should function just like as if it was on prem. They will demand the same technologies and processes, the same IaaS approach to anything. Their – legitimate – reasoning is that 95+% of the workloads are on prem today, therefore anything you create should look like the current stuff since it is easier to operate. The untold driver is fear that you need to address upfront: Nobody will lose their jobs but likely to have a different job (with a different skillset) within 4-5 years. All of us need to learn and unlearn.
  • ITSec requirements dictate technical solutions that take much longer in a bank than in a small (non-financial) account. It is like running the Marathon in a heavy diver suit while all others run in shorts… An example: in a public cloud cross regional DR capabilities come out of the box, unless you implement private endpoints when you lose most of this functionality.

running_the_marathon_in_a_heavy_diver_suit.JPG

  • The nose of the ship cannot travel faster than the back of the ship, ie. it does not really help to produce designs and technical solutions that other parts of the IT org cannot implement let alone comprehend. This is a lesson we learned the hard way: You need to move the entire ship. Trainings, constant communication, demos and regular small updates help the transition.

Dependencies

architect_in_the_spider_web.JPG

You will find (at least) the following dependencies:

  • Identity and Access management – the identity management process and technology. eg. your IAM system does not work with cloud native identities and/or it is being replaced therefore does not accept any changes.
  • Ticketing system – your team gravitates toward JIRA (as most SW dev. projects do) while ITOps will demand ServiceNow. Shoveling data manually from SNOW to JIRA is a pain in the neck but you want to track the hours in a single system.
  • Click-Ops - your IaC code will bump into manual steps in the process, eg. a FW port opening might take a week while your code runs for 45 minutes.

Technical issues

  • If you implement IaC you need to pay attention for the smooth coexistence between the IaC code and the policies on top of them. This is a daunting task to debug a code where both layers are in constant move.
  • on prem proxy servers and multiple firewalls plus an on prem DNS vs. your cloud internal routing design will give you a bunch of networking and name resolution issues where you do not have access to the monitoring logs of any of the on prem components. it will require a smooth collaboration with the network people to resolve simple issues like a wrong conditional access setting.

The exit strategy

There are 3 caveats with a cloud exit:

  • when you mix up a disaster recovery and an exit scenario. the difference is the RTO allowed. the first is measured in hours, the later in years. It takes the same effort to walk away from a cloud than to walk into it.
  • when you allow only technologies that have an on prem equivalent. This way you do preserve your exit but throw away any innovation produced by the cloud provider. The deeper you go into the PaaS/SaaS forest, the less likely it is that you will ever come out.
  • when the seller’s state, eg. the USA says NO. In this case a cloud-to-cloud exit becomes unattainable (MSFT, Amazon or Google will leave the local market on the same day)

A reasonable exit strategy should be formulated, that will be acceptable by the local regulator. Regulatory, compliance and engineering task forces should collaborate, with an experienced leader (the best is someone who worked as an auditor before). Think twice before you execute this exit. This will ruin the ROI of the whole thing.

The square peg in a round hole – the lack of public IP 

If we had to had to name one item that caused us the most headache, it is easily the fact that the public cloud is designed with the internet in mind, that is that all services can be accessed directly from the internet. In case of an enterprise environment this is not the case, you have to go private.

The nonfunctional requirements

  • All of these requirements are known for decades, but work differently in the cloud, especially for PaaS and SaaS. Think about monitoring, logging, alerting and backup early and make reasonable compromises with their on prem counterparts.
  • Cloud monitoring, alerting and logging should be incorporated into the company level monitoring, alerting and logging. It is inevitable because the cloud-based systems will not operate standalone but integrated with on-prem (and later maybe other cloud) systems. In case of a problem an end-to-end view is needed, and it is possible only with an integration between the various monitoring systems.
  • Backup: you need to have a clear view on what you need to “bring home”, ie. back to on prem and what is okay to store in the cloud. At the end of the day, it boils down to the level of trust in your cloud provider and the demands by the regulator. Be aware that some of the backups provided by the provider are not compatible with anything else, ie. cannot migrate them to any on prem equivalent. (eg. KeyVault)
  • The big shift is when the Application Operations teams will claim a bigger slice of the traditional monitoring and alerting pie, using their own – mostly cloud native – tooling that will overlap in functionality with the tools used by IT Ops.

The non-technical side of the house

We shuffled all non-technical topics into a single team: Process – Governance – Compliance – Cost. In retrospect we underestimated the amount of work and the difficulties related to these topics. (engineering myopia) In fact there is a significant difference between “it works from an engineering aspect” and “it is a service one can provide with a predefined SLA”.

the_real_x_wing_fighter_and_how_it_looks.JPG

  • ITSM processes: IT Service management processes assume that everything is done by ITOps, the client just files a service request. ITOps is right claiming that an incident is a pain regardless where it happens, therefore you need to have a proper incident (and change) management process. If you are an ITIL shop, you will find out that a big chunk of the areas covered by ITIL3 are simply not applicable for the cloud. (hence the introduction of ITIL4 several years ago.)
  • The cost thingy: This is very easy to leave the lights on (on prem “flat fee - we already paid for it” reflexes kick in) but will cost you dearly. IT is one thing to spin up resources automatically, and seems like just a small change in the code (create vs. destroy) to tear them down. But somehow it just does not happen without forcing it. This is not by accident that FinOps became a discipline on its own right in the last couple of years.
  • The service catalog: In case of a cloud request the client may ask for a subscription, then for the predefined set of PaaS components in it, or just for the subscription and then would do the rest him/herself. Ie. you need to clarify what the service catalog should contain.

What comes next

at_the_beginning_of_the_journey.JPG

I wanted to thank the entire team who walked along in the last 18+ months. We are not finished by any measure and with the quickening speed of change we may not even know what “done” really looks like. What is beyond doubt that the big players turned their attention to artificial intelligence. It is a safe bet to forecast that AI will infiltrate all aspects of the cloud within a few years and will become the new battleground.

To finish with some fun: I used Midjourney to illustrate this post. The last prompt I used was this: “the magician pulling the rabbit out of the hat but the audience is not happy, cartoon by David Horsey, --ar 3:2”. Is it possible that AI already went rouge?

ai_went_rouge.JPG

As always, I appreciate any comment of feedback.

 

 

 

 

Horseshoe bend #4 – Mount Rushmore (from the Canadian side)

mount_rushmore_the_backside.JPG

In regulated industries you are required to produce an exit plan before you are supposed to make your entrée in the public cloud. On prem stalwarts cite this requirement on a regular basis demanding a plan as detailed as the inroad itself. For a while I figured this was just an excuse from the luddites to slow down progress, so it puzzled me when I heard this from people whose opinion I do care about. The bug buzzed in my ear for months: what if they are right and this road indeed leads to trouble? What if Mount Rushmore is not so pretty when viewed from the other side? To settle this I typed in vendor lock-in cloud computing in Google and Bing to learn. Most answers were either sponsored by cloud vendors or by firms like Cloudflare of Red Hat (Cast AI, VMware, Wasabi etc.) whose real objective was to convince you that you can avoid this trouble with their assistance (that is jumping in their trap instead of Amazon’s or Microsoft’s.) Some were thoughtless like the one from a HDD manufacturer arguing that cloud lock-in would lead to the lack of scalability (really?), some were lazy enough to copy entire sections (even the drawings) from each other. Okay, this is useless, so let’s dig deeper. The rest of this article is the result of this digging and the outcome of consulting with Lydia Leong from Gartner, peppered with my longing to computer history. Spoiler alert: when was the last time you listened to music on a CD player, or to phrase it differently: do you have an exit strategy for your Spotify (Netflix etc.) subscription, that is you purchase an on prem copy of each song or movie you like? If you don’t, then read on!

A few definitions:

Disaster Recovery Plan ≠ Exit Strategy ≠ Exit plan ≠ Testing the Exit plan

  • A Disaster Recovery (DR) plan is part of the Business Continuity Plan (BCP). It has nothing to do with an exit. When somebody asks you to execute a cloud exit in days, that is a DR situation, not an exit. For this reason, I omitted situations when the Cloud Service provider (CSP) becomes insolvent overnight and is forced to shut down its entire service. I also left out cases like a nuclear bomb wiping out all DC-s in multiple regions (not just availability zones) of a cloud provider. In this case we have an existential problem way beyond a service disruption. (and yes, Putin is moving these deadly toys into Belarus as we speak…)
  • An Exit Strategy defines the triggers when your Firm will want to or will have to get out of a Cloud agreement. Players in this decision are the Business owners, the IT leadership, Procurement, Legal and the IT architects.
  • An Exit plan is the series of steps -and the players with their specific roles and responsibilities -that are triggered by events defined in the Exit Strategy. It covers technology and business process related changes; thus, not an IT only problem at all.
  • Two types of cloud exit: moving an application elsewhere or leaving the platform altogether are two different games. Depending on the players involved in the conflict triggering the exit you might face any of these.
  • Testing the Exit plan: walking the talk and moving a workload from the original cloud location to A: another cloud provider or B: back to on prem.

Concentration risk is the risk associated with dependence on a single supplier for multiple business capabilities. This is applicable to on prem IT environments as well. Imagine that you have to move away overnight from the RDBMS provider having a few thousand DB-s and a few hundred thousand lines of PL/SQL code holding the bulk of the business logic of your core applications. The same goes for the runtimes and the language itself from the same provider. You bet; you are on the hook. Some smart consultant coined a derivative called Cloud concentration risk. This is the risk associated with dependence on a particular cloud provider for multiple business capabilities, such that a single failure can result in a disruption to multiple aspects of the business. It’s on prem sibling is a major outage in your primary data center.

The triggers: Who can say no?

 There are five possible actors in any cloud exit: the service provider and the consumer, the buyer’s regulator and two nation states (the vendor’s and the consumer’s).

the_payers_in_an_exit.JPG

  1. Buyer-seller conflicts: this is in scope for this post.
  2. Buyer in conflict with the seller’s state –this is a weird idea for any firm (at least in my home country) to get into a fight with the US government, so I risk to skip this.
  3. Seller in conflict with the buyer’s state – not impossible, (eg. East India Company vs. China, but this one too ended up as type D.)
  4. The conflict between two states – the USA banned the sale of key IT technologies (on prem as well) to Russia after their attack on Ukraine. FTR: it was not allowed to transfer any personal data outside of the Russian Federation anyway, therefore US cloud providers were a no go before the war.
  5. Whoever claims that the (HUN) regulator said no to the public cloud, pls. show me the actual paragraph in their guidance to prove it.

The types of conflicts between the seller and the buyer (type A):

When the seller says no When the buyer says no
  • A serious violation of the contract terms by the buyer (eg. you posted adultery content on your website. In case of an enterprise client this is unusual and probably would trigger a “remove it immediately or…” reminder rather than a hasty service suspension.
  • When you do not pay the bill. This is where the old adage applies: if you owe the bank 50 thousand dollars, this is your problem, if you owe them 5 million dollars, this is the bank’s problem. The bigger your consumption is, the more likely the vendor will negotiate, although this is not a life insurance.
  • When the seller is told by its state to say no – ie. this is Type D. If you plan to substitute AWS with Azure (or the other way around) keep in mind that they are from the same country, ie. subject to any type D issue simultaneously
  • When the service quality is unacceptable - regular service outages, degradation of service
  • When the price goes up at renewal without any benefits compensating for it. The usual way of carrying this out is removing an existing discount. This is playing hardball. Not cloud specific - see when the tax collectors of an RDBMS provider show up on December 21st for a little audit.
  • If the cloud provider enters your market as a competitor. (Apple Pay BNPL, anyone?)
  • When you decide to rationalize your cloud footprint since realized that 3 providers are probably too many.
  • When the innovation dries up (for folks in photography, this is when the Hasselblad 501CM became available in ruby red) I think this is by far the most dangerous thing that can happen in a cloud relationship since it breaks the balance between the price and what you get for it.

 

A word on innovation and its relation to vendor lock-in

 

Repeat after me: Innovation comes from differentiation. Maximizing the value of cloud adoption requires exploiting the provider’s capabilities, thus increasing lock-in. The flip side: The greater your need for portability, the more you are likely to sacrifice some of the benefits of cloud services —and the greater the complexity and cost. The deeper you walk into the cloud forest, the more likely you will stay there for a long time.

the_price_of_moving_away_from_the_cloud.JPG

I met an IT executive who thought that the cloud was nothing more than a 3rd data center owned by someone else. For this reason, he demanded complete symmetry, that is using components in the cloud only if they had an on prem counterpart. (read IaaS) To be fair, he was right from an exit viewpoint, but ignored the efforts of all major cloud providers in the last 5+ years, that is PaaS. This is where most of their R&D spend went, probably beside IT Security. Bottom line: the more value you take out from the cloud the more difficult it becomes to exit from it. In case of SaaS this is simply a redo exercise, same cost, same time.

To illustrate the innovation story let me use an old example, the 360 series mainframes from IBM. This was the first modular, general-purpose, upgradeable series of mainframes with the same OS for all models – that is running the same application without modifications, introduced the micro-coded CPUs, the 8 bit bytes (today it sounds funny, but there was financial pressure to use 6 bit bytes, since memory was expensive), the EBCDIC character set, a new floating point architecture, a nine track magnetic tape drive, backward SW compatibility with older IBM products, all in all a tremendous amount of innovation. It cost half of the development of the atomic bomb, the development time was way over the original plans, but within 15 years it drove the seven dwarfs out of the computer business (7 dwarfs = Burroughs, Sperry Rand, Control Data, Honeywell, General Electric, RCA and NCR) Was it a true vendor lock-in? You bet it was: It was compatible only with itself, but it was the best of its time so much that this was the origin of the saying “Nobody ever gets fired for buying IBM”. And guess what, this was the seed of the antitrust law suit that almost chopped IBM into pieces. If you are into computer history, check out the book written by the Fred Brooks (the PM of the development, working in tandem with Gene Amdahl, the lead architect) titled the Mythical Man-month.

A word on R&D budgets: If you check out the annual reports of the hyperscale providers and their traditional on prem counterparts you will find telling numbers. In a nutshell: there is an ongoing shift of profits from the incumbents to the largest cloud players. (eg. Amazon is now the largest database vendor, surpassing Oracle.) Their net earnings are manyfold compared to the traditional HW and on prem SW providers like HP or even IBM. If we assume that each R&D dollar has similar financial impact at all major players, this is fair to say that the hyperscale providers are on a growth trajectory (because their cloud R&D is larger and is funded by their cloud business, not by a separate cash cow) while their on prem counterparts will face tough times within 5-6 years. This is why IBM paid 34 billion USD for Red Hat. This move was triggered by the realization that they lost the cloud war. The real thing is that the war is no longer in the cloud area, this is over, the battle moved to the AI territory with even bigger stakes.

Busting myths

There are no solutions that eliminate lock-in. Vendors just want you to become locked into their solution instead of someone else’s. Think about it: if Vendor A’s service is 100% compatible with Vendor B’s service, then the ONLY differentiating factor will be the price. This would lead to a cost war to the bottom, that would force both vendors to cut back their R&D budgets. At the end they (and you) would end up with commodities where the only differentiator is the price, read: ZERO innovation. There are competing forces at work here: the appetite for innovation in the buyer’s side intertwined with the need to differentiation on the vendor’s side plus the demand for freedom to escape those providers whose innovation stream has dried up. Since I used a mainframe example for ground breaking innovation I have to mention other mainframe providers whose only excuse to exist is that one’s primary application runs on their iron and this is very-very expensive to move away, and they know it. On the other hand, you have a choice which vendor’s lock-in you want to avoid and which we prefer in order to avoid the other one.

A cloud exit plan does not provide any reduction in your availability risk. The period when the cloud service is unavailable is way shorter than your ability to execute any exit plan. You need to address this in your DR plans WITHIN the given cloud itself. (nope, cloud to cloud exit is not a panacea for resiliency, see below.)

Multi-cloud is not a solution for cloud resiliency since it is difficult and expensive to implement. I had a chat with a senior IT executive a few weeks ago. When we got to this issue, he figured he would ask his teams to build a software application targeted to public cloud to be either portable, OR to develop two versions of the same SW in the same time for the two hyperscale providers. I think both of these ideas are unpractical: If you build a software that uses the common subset of the functionalities you will throw away the bulk of innovation coming from any of these providers. If you build for both in the same time you will ruin the business case and the time expectations of the business, ie. I would rather not even start this endeavor.

One more word on multi-cloud: this will eventually happen to most large enterprises, either by choice or by accident when a software vendor is picked by the business who happens to use the other CSP. This will put an additional training burden on the internal IT departments of large enterprises, let alone cranking up the price tags for those folks literate in both technologies. (I always talk about two hyperscale providers instead of three, no intention to disregard GCP, this is just simpler to express myself this way.)

If your exit is triggered by a change either from the seller or the buyer’s regulator, this will rule out any cloud-to-cloud exit, because a regulatory change (for the record a state decree) will render all of your target exit providers unviable. (eg. Russia, unless you consider Alibaba…)

Your ability to execute an exit from your cloud provider does not improve your negotiation position, since cloud exits are complicated and costly and the CSP knows that the cost of a cloud switch will exceed any price advantages gained through the switch. To be fair, this is no longer a money printing machine like it used to be in the on prem - perpetual license days. This is a service with actual cost of building and running astonishingly large data centers all over the world, let alone their electricity and communication costs. Do not dream about 50% discounts. If you check out the annual reports of key cloud providers, their profitability is in the range of 30-35%. If you consider their buying power and operational efficiency, chances are 1 kilogram CPU from them cost less than 1 kilogram CPU in your DC. (Leaving on the lights when not needed is a different problem, but this is finops, a subject for another post.)

Containers do not eliminate cloud lock-in: Theory (and Kubernetes providers) say that putting applications in containers will solve the cloud lock-in problem with no drawbacks. Tag line: “Once an application is in a container, it is easy and cheap to move it between cloud providers, or between cloud and on-premises environments.” On the one hand containers and microservices became the hallmarks of cloud native development, and they do ease some aspects of portability. On the other hand, they do not address most of the underlying causes of lock-in. Container management platforms are one out of the hundreds of PaaS services available from any of the top cloud service providers. Replacing this with a 3rd party component will have no effect on the dozens of PaaS components also required to run a modern application.

Regulators DO NOT want the whole exit plan executed before you go to the cloud with your app. They will be satisfied with plans that can be executed over a reasonable period of time (such as two years), without requiring that you demonstrate your ability to actually do an exit. The effort required to test an exit scenario is comparable to the effort of moving to the cloud itself. Unless the regulator wants to ruin the whole business case to move to the cloud, they will not demand it. The good news, they heard of FinTech and BigTech and know that if they overdo their “no cloud please” thingy, they hurt the entire industry rather than protecting it.

Your options

  • Minimize lock-in as much as possible: Cloud IaaS providers are treated like infrastructure resource commodities, and higher-level functionality is avoided wherever possible. This requires a very high level of skills in the IT team and significant engineering effort, time and risk since you assemble your car from thousands of tiny parts coming from several manufacturers. Not recommended, since you lose the innovation and the developer efficiency gains brought by the PaaS components. You throw the baby out with the bath water.
  • Use overlays to minimize cloud IaaS provider lock-in: You can try to minimize lock-in to the cloud IaaS provider, by overlaying the provider’s resources with third-party solutions that are portable across multiple environments. This results in a high degree of lock-in to the overlay solutions and vendors, as well as the ecosystem around those solutions. The cloud IaaS providers may be treated like infrastructure resource commodities, thus losing the innovation brought by the cloud provider.
  • Be loyal to a single ecosystem: you choose one vendor’s ecosystem to base your strategy on it, accepting the notion that you will have long-term dependency upon that vendor. Innovation, ease of integration and speed of delivery are the highest priorities. You accept that you will become highly dependent on this cloud provider over the long run, and must invest in building a strong, trusted relationship with that vendor. Resiliency is handled within the provider’s ecosystem, using cloud native tools.
  • Be loyal to more ecosystems: You build capabilities on two or more providers, but not for resilience purposes, but to maintain the balance when negotiating with mega players. You manage cloud concentration risk primarily through a multi-cloud workload placement strategy, rather than through a cloud exit strategy. The two cloud you bet on are likely to be two out of the three hyperscale players.

The final word: You do need to be prepared to exit your cloud provider but not for the reasons usually quoted by most articles on the web. The real dilemma is to pick the right provider and to maintain the relationship as long as it provides competitive advantage to your firm. A cloud exit is a complicated and very long journey. Planning an exit in advance will help you shorten the time to a successful execution, thus jumping from a limping horse to better one in time. To paraphrase Oliver Cromwell "Trust in your cloud provider but keep your powder dry!"

 As always I will appreciate any feedback on this post.

 

Sources used for this paper:

  

Horseshoe bend #3 – Midway

midway.JPG

The battle at Midway is symbolic for many reasons, it showed the importance of information security (the key to success of the US Navy was that they decrypted the Japanese communication and knew the plans of Yamamoto), and marked the end of the era when battleships reigned and the beginning of the supremacy of aircraft carriers. (Let alone it was the equivalent to Japan as Trafalgar was to France) I realize that the analogy is a bit far-fetched nevertheless I build this post around it: while IT security is more relevant than ever for any enterprise, the old way of thinking about it will no longer reach the goal. No, I am not talking about quantum computing and its threat of breaking current cryptography in minutes, I am talking about the cloud. ITSec has to change.

Let me nail it down: I do realize how important information security is, history provides ample proof points. As of today, cyber warfare is on equal terms with any other military branch. (Think of Stuxnet). On the other hand, a recent study by McKinsey found that the average life-span of companies listed in Standard & Poor’s 500 was 61 years in 1958. Today, it is less than 18 years. If you recall the faith of Blockbusters, Borders Books, Nokia or Kodak you see the Innovator’s dilemma in action. If you stop innovating, you will wither (sometimes very fast), if you are careless, you will suffer significant material losses. (pretty soon)

What we know for a long time

  • Navigarenecesse est, vivere non est necesse.” Going online (that means mobile) is a must, tweaking your business process to delivery speed is nonnegotiable. Gen Z measures a response in seconds, a whole transaction in minutes and want it all anytime, anywhere.

gen_z.jpg

  • The ITSec playing field is not levelled, a threat actor can make way more damage with 1M USD than the good guys can fend off with the same amount of money.
  • The imbalance between demand and supply for skilled ITSec professionals is cranking up prices to the upper 5 digits range (in EUR) in countries where this used to be the package of mid management. Despite of the sky rocketing compensation, there is unmet demand.
  • Hacking is a lucrative profession and a weapon in the arsenal of nation states. The number of data breaches grew in sync with the number of users and the amount of data generated and exposed to the online world. Ugly: yes, surprising: No.
  • The biggest concern in any ITSec protection scheme is the human factor combined with organizational inertia, from careless users and unnoticed human config errors to orgs working in silos not giving a damn about each other’s motifs and agenda. (Read the case of the London underground fire at King’s Cross and you will know what I mean.)

In summary: as a consequence of the above more and more firms move a significant part of their business online, while not being prepared, exposing their cyber sec weaknesses to the outer world.

Something happened - what we learned lately

Let me enumerate the changes that have happened in the last 5-8 years in the ITSec arena.

the_moat.JPG

  • The business demands collaboration with entities outside of the main org, thus a significant portion of the value creation process happens OUTSIDE of the castle that you are trying to protect. The “castle and moat” paradigm even when executed with the outmost rigor is not enough. If we add the growing segment of SaaS based functional delivery this statement becomes more relevant.
  • The public cloud grew indispensable, sucking the bulk of investment dollars from the on prem world, thus becoming a self-fulfilling prophecy. Three groups formed: the hyperscalers, the multi-cloud vendors (riding on these hyperscalers) and the incumbent traditional players.
  • Since hardware is becoming a commodity, there is a power shift towards developers. Yes, they are sometimes closer to a primadonna than a soldier, demanding weird perks. Live with it. For the record: the price difference between a Macbook Pro and a good Wintel notebook is around two days compensation of these folks, so be it.
  • A DDoS attack with a botnet made from smart fridges is a novelty, though a pretty sad one. (see my comment of the lack of ITSec expertise, this time at the fridge makers)
  • The shared responsibility model introduced by the cloud blurs the boundaries and sometimes makes you feel as if it was somebody else’s (ie. the could provider’s) problem.
  • The vast majority of recent and future successful cyber security incidents were and will be enabled by a human configuration errors. Throwing more human effort at the problem will only generate more errors. Just because you do it slowly, it will not make it more secure either.

The need for speed

  • The ability to respond to events in the business environment quickly became the nr. 1 priority to business leadership, regardless the industry. (COVID, the Russian invasion of Ukraine or the double-digit inflation came overnight)
  • There is a widening gap in agility between the cloud and devops enabled development units and their IT Sec (and IT Ops) counterparts. IT is getting good at producing new code fast, but is not yet prepared to protect this new code well.
  • You measure the life span of a physical machine in years, a VM in months and a container in minutes. With Kubernetes coming to age with the support of major cloud players, the traditional ways of creating, managing, monitoring and protecting these compute instances become more and more inadequate.
  • Former U.S. Deputy Secretary of Defense William Lynn argues that “cyber-warfare is like maneuver warfare, in that speed and agility matter most” This guy probably knows a thing or two about cyber security, since he wrote Pentagon’s cyber strategy in 2010.

What is next

The last part of this post is a list of proposed actions. For the record: being a cloud CoE lead I am biased and this is part of my job to be biased. A “conservative revolutionary” is an oxymoron, right?

Accept the paradigm shift

  • A paradigm shift needs to be answered by another paradigm shift: insisting on total manual pre-control and ignoring the importance of speed will put ITSec at odds with the developer communities and eventually with the business. Explain, teach, go beyond saying NO and show how it can be done securely. Sit and breath with the coders, literally.
  • “Widening the moat”, ie. making it more cumbersome to access data from within the castle (in the cloud) will not protect the firm. As leased lines between company locations became obsolete (my 5G phone runs circles around a 4 Mbit leased line), soon the moat will become obsolete for most volatile apps or it will move where the assets to be protected are, ie. to the cloud. This is not by accident that MSFT became a significant contender in the unified endpoint management and SIEM (Security Information and Event Management) arena. They had to in order to make Azure (their new cash cow) prevail.
  • Protecting the identity of users, machines and applications will be (is) the core of the new era. I risk to forecast that biometrics as the primary means of (human) authentication will prevail despite of the current legislative hesitation.
  • Turn your teams to developers themselves who author and run the configuration monitoring scripts (Ansible, Terraforms, shell, does not matter) the hardening and patching states of all assets. Realize that these scripts will behave as a real code, you will store them in a source repo and you will create new releases of them instead of just replacing a parameter in a shell script on your c:\ drive.
  • Be prepared for the increasing pressure from cloud vendors: They will combine the increasing functionality gap between their cloud based and on-prem offerings, will produce licensing arrangements making their cloud-based services more compelling (eg. the Hybrid advantage from MS where you double your existing on prem license amount for Windows servers IF you use their cloud based KMS service) and eventually they will discontinue their on prem product ranges altogether just like Atlassian announced already.
  • Convert your mindset: thinking in static, dedicated source and destination IPv4 addresses is the past. A cloud provider will not guarantee you that the IP address range for a VM scale set or an Kubernetes cluster will be the same two weeks later as it is today. Think in FQDNs instead of static IP addresses and use the DNS service of the cloud provider.
  • Insist on discipline where it matters: protecting the endpoints, primarily the mobile devices. Discipline applies for senior management as well.

Focus on your people

  • Many companies have the cash to buy the best of breed ITSec offerings on the market, but lack the skills and capacity to bring the most out of them. Reverse this trend. Hire the best possible people and explain to HR that compensation tensions are less painful than losing the trust of your clients.
  • Financial realities will force traditional ISVs to port their core offerings to the cloud and their limited resources will dictate to place their bets on these cloud-based versions, thus slowly but surely will abandon their on prem versions. The tendency will reinforce itself with every product iteration. The gap will widen. Beef up your cloud related skills and capacity.

Learn to code and automate everything

  • If you measure the latency in response in months due to capacity shortage and then you manually execute a process based upon outdated config information, you will miss the target. The more manual steps you put into a process, the more error prone it becomes, introducing “flavors” into the execution. When you add favors to the process, your quality assurance becomes a lottery. Automate every step in your process including auditing your own work.
  • Defense in depth: while the “castle and moat” approach is outdated, but maintaining various layers of defense is very much alive. The goal is to protect any asset in the org with vigor an investment that is proportional to the asset being protected. Eg. do not protect information that is already on Linkedin, but create a dedicated subnet for your really important stuff with well monitored control points to these subnets.
  • Patching a vulnerability a year after it was discovered is autopsy. Real time monitoring and detecting and reacting to anomalies in a near real time manner will be crucial. Voluntary “confession” of ITSec considerations in an Excel sheet is as useful as resuscitating a corpse. (except for audit purposes) You need to automate the discovery and eventually the whole response.
  • Go beyond the static (one-time) snapshot mentality where the name of the game is making any change difficult, accept the new rules and become able to detect these changes and respond to them very quickly.
  • Focus on AI: The role of AI will become prevalent in ITSec on both the attack and the protection side. Bluntly put algorithms will fight algorithms within ten years. (I risk an estimate that this is already the case on the attacker side.)

Bottom line: all vectors point to one direction: ITSec need to change and have to learn to automate, that is have to learn to code. As always, I appreciate your feedback.

 

PS: The first image is the IJN Mikuma, a Mogami class heavy cruiser sinking during the battle of Midway. Others were generated by https://openai.com/dall-e-2

Horseshoe bend #2 – Are we there yet?

are_we_there_yet_jpg.png

Sponsors have the tendency to want to know how any project in their realm is getting along and above all what they get for the money that they threw at us. They ask the same question over and over again: Are we there yet? To be honest, when you requested a few million bucks for a cloud implementation, it makes sense to know what “there” is and to be able to tell when you reach this point. This is #2 in the cloud related articles dubbed the Horseshoe bend, focusing on the measurement of the outcomes of a cloud implementation.

In case of a cloud adoption program there are four sets of folks in your organization whose interests you need to cater for. These people are the business (the guys who fund the whole thing), ITSec – the knights who say Ni (or rather No), IT Ops who see this whole thing as unnecessary and last but not least the compliance folks representing regulatory scrutiny. The rest of this article attempts to set reasonable targets for each stakeholder group, define metrics for each of these targets and at the end to prove why you should not stress the whole thing beyond reason.

The business metrics

  • The ability to respond quickly to a surge in demand (or a sharp decline for that matter) – this is a no-brainer, as long as you apply the ground rules of Infrastructure as a Code. (AND as long as your cloud provider does not run out of steam.) (Metric: being able to spin up additional compute/storage resources within a few hours from the demand.) WARNING: it only makes sense to dynamically scale the infrastructure if the application layer is able to take advantage of this capability.
  • The speed of infrastructure design and implementation from the request until it actually goes live. This is the one that has a great effect on developer productivity. The way to do it is by using technology building blocks and the underpinning blueprints combined with automation. I mean full automation, with no manual intervention at all. This will require that ITSec and ITOps GIVE UP pre-control and to move to post-control with near real time policy violation detection. Approve the design, not the actual instance and check if we strayed away from this design.

one_ticket.jpg

The caveat is when you need to link your shiny new cloud environment with its on prem buddy carrying a bunch of legacy technologies and more importantly legacy processes. It is like Lightning McQueen pulling Bessie. Yep, it may not be that fast… (Metric: the time between the first and the last related ticket designing and implementing an IT infrastructure should be 25+% faster than its on prem counterpart.)

lightning_mcqueen.jpg

  • Cost transparency – this is easy, just implement proper tagging and a data analysis/visualization tool (a pedestrian Excel with a SQL backend will do) on top of the analytics report. Warning: it can be a double-edged sword in environments with poor cost transparency since – while it indeed can tell to the penny who spends how much on what – this can be pitched as a weakness compared to an on prem alternative where the costs are unknown or where the actual user of a service does not feel the pain of their extravagant requests. (Metric: report AND forecast the cloud spending by cost center. Produce cost reduction suggestions as a bonus.)
  • Technology adoption speed – The marketplace of any major cloud provider contains thousands of applications, development/management/monitoring tools, two magnitudes more choices than your on prem IT can handle. Balance is the key word here, too much freedom would throw the monkey wrench into IT Operations, while banning the inflow of new technologies would defeat the purpose of the whole thing. Clogging the path of innovation is a very bad idea, therefore when ITOps no longer can handle a new technology, apply the “you build it, you run it” principle.

innovation_vs_complexity.jpg

The Technology metrics

As long as you opt for IaaS, you will have to deal with the same duties as if these VM-s were in your data center. And in some cases, you cannot avoid deploying VMs in your cloud subscription. Unless you plan to operate what you have built you need to realize that demanding the same processes as used on prem is a legitimate ask from Ops. The problem arises if those processes are siloed and littered with manual steps. IMPORTANT: The strength of a cloud infrastructure is given by the level of integration between the components. As soon as you start to operate the various components in separate silos, you are going to kill the essence of the whole thing. This begs for a dedicated Cloud Operations, but it would question the status quo.. Anyway, here are the technology metrics:

  • Know what you have: as long as you deal with a computing resource deployed for longer than a few hours you want it to be in your CMDB. This is obvious but easily forgotten that this CMDB is on prem. (Metric: all CI-s are known by the CMDB)
  • Config management: Automation can be a key differentiator here. Rather than trying to find an error in a configuration by eyeballing config files one could write a code that makes sure that reality equals the design. (Metric: the number of differences between the designed and the actual parameters.)
  • Monitoring: Cloud providers use the same components, architectures, hypervisors etc. (but not the same processes) that you do, therefore are susceptible to the same errors like their on prem counterparts. Things will go wrong sometimes, so you have to implement monitoring. For a smooth coexistence feed the metrics streams into the traditional on prem monitoring tool and its cloud native alternative as well. (Metric: key metrics are fed to a monitoring system with alert thresholds defined.) WARNING: no matter how good your infra DR capabilities are if the application layer is not prepared to use these capabilities.
  • Incident management: The real thing is how fast and meaningful your reaction to an alert is. This topic is dealt with in ITIL, so I rest this case with the assumption that this is mostly the same as on prem with one key difference: DO NOT to allow anybody to temper with the production environment manually since it will create a collision between the parameters set by the automation script and those set by an Operations person. The question is if you will have the discipline to make changes to the IAC code, then run this code or you cannot resist the temptation to make manual changes. My hunch is that you will violate this rule sometimes…

The ITSec metrics

None of us want to fall victim to a hacker attack. I learned the following maxim from ITSec people who were clearly beyond me: “You can inflict way more damage with 1 million USD than you can avoid with it.” The playing field is not even. This is that should make you ITSec cautious. The problem is when you achieve relative strong security posture at the expense of the business flexibility. The following list is just scratching the surface.

  • Using Multi Factor Authentication (MFA) for any activity – in case of public cloud you are exposed by definition, your first line of defense is the identity of the users. You need decent Identity and Access Management (IAM) tools and processes. The very minimum is to use MFA in all cases, not just for the admins. (Metric: yep, MFA for all.)
  • The granularity of admin rights aka. reducing the attack surface: I recall my early days in IT in 1990 when I felt Mr. Important when I got the admin access of the Netware 2.15 server at my first workplace. Of course, it was permanent, revoking would have meant a demotion, right? Wrong: You do not need admin access to anything unless you have a job to do with that system. Using Privileged Identity Management (PIM) is an essential way to reduce the attack surface, namely time. Of course, its efficient use is based upon the assumption that the PIM approval process is fast. In fact, the best thing is if you do not use admin accounts to do anything in a production environment, but use service principals instead. (Metric: admin rights are granted for a few hours to the least number of people when needed. Dig the global admin account in a safe place and use it only as a last resort.)
  • Cloud native security metrics and best practices: cloud providers will create assessments of your cloud implementation, suggesting improvements. 3rd parties will also produce reports on the known vulnerabilities (eg. Sysdig, F5, Read Hat) Read these and act upon their findings. It is wise to procure a penetration test against your own implementation on a regular basis. (Metric: a predefined security score – likely from your provider and the speed of reacting to these findings.)

The compliance metrics:

d'Artagnan did not worry about the duel waiting for him at 2PM with Aramis since he knew he probably would be dead by this time due to his duel with Porthos scheduled at 1PM. I am more worried about hackers than auditors, so I do not have metrics for this area yet. (okay: being in compliance with a the regulatory guidelines whatever their real meaning is.)

Summary – how to prove to your sponsor that you reached the goal?

The next paragraphs might look weird after pages spent on defining them: these metrics are less relevant compared to what they miss to capture since they cannot measure it: the impact of the knock-on effects of a good cloud implementation. As Roy Amara put it: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” I am convinced that cloud computing is going to have a profound effect on how we do computing in the future.  It is not an end into itself but an enabler, and we surely do not comprehend all of its implications since it’s hard to notice things in a system that we are part of and it’s hard to notice the incremental change because it lacks stark contrast YET. As always, I will be happy to learn your feedback.

süti beállítások módosítása