How to put your system in the cloud, keeping your head out of the cloud? Part2

In the first part, I provided a subjective list of areas important to be considered in the process of migrating infrastructure to the cloud based on my project experience. In this article, I take a closer look at the first three:

Inventory
Potential infrastructure for migration
Migration schedule

Inventory and determination of the infrastructure's potential for migration

Let's assume that we are already determined to migrate, we know why we want to do it, and we have an idea of how our infrastructure should look like. For example, that:

we need to implement the CI / CD process,
we will build a cloud environment on, say, the popular Kubernetes,
we are interested in a local cloud (or on the contrary - only public)
and so on.

So the moment has come when we need to analyze and verify the potential and limitations of our software to move to a new software infrastructure, and then hardware (which is a secondary issue).
This task is not trivial, what's more - it turns out to be very political.
There can be two general conclusions: "it is possible" or "it is impossible".

Of course, the "possible" solution includes both options: "it is possible, but the changes will be so costly that it does not pay off" and: "it is possible, easy and profitable" (and everything in between).

Packing components into containers is not a complicated undertaking. Of course, the devil is in the details and sometimes problems arise. In theory, however, this is a fairly simple configuration task.

The conclusions from the inventory of the potential and limitations of existing containerization software may turn out to be a gateway to cooperation for some suppliers, and for others they may close the door in their face. Therefore, this stage may be a political but not a technical one. I am writing about it because it is a very important element influencing decisions on how to implement the project, at least in the initial phase of collecting data and arguments.

But let's leave politics aside and focus on more design tasks.

I assume that the best solution is to perform an inventory in a mixed team, consisting of experts from the organization and from the supplier. Thanks to this, the organization can better understand the specificity of the software provided by the supplier, and the supplier gets to know the context of the functioning of its solutions in the organization and tightens cooperation with it.

It is a good idea to order a PoC (Proof of Concept), under which the supplier will containerize a selected element of the infrastructure - thanks to such experience, the assessment of the containerization potential will be more reliable. I know from my own experience that the suppliers will depend on maintaining cooperation and will gladly undertake such a task. On the other hand, the contracting authority will be able to verify, on the basis of the operating software, what the supplier's approach to containerization looks like in practice and the chances of success of the entire project.

Modularity as an opportunity for success

If the infrastructure was built on the basis of modular, service components, then there will be no major problem with such PoC. Components can usually be easily closed into containers, i.e. to build a cloud-ready version.
There are of course many challenges here, such as:

communication with the outside world,
security,
the impact of an external framework supporting containers on their operation,
monitoring, log analysis,
load distribution, division of architecture in the new environment.

Watch out for transport between components. Real-life example

In the solutions provided by Softax, used for communication and data exchange between components, we use many types of transport. Our components have been built for several years in such a way that it is possible to define transport in a configuration. We have our own transport libraries dealing with load-balancing and fail-over issues, and we also use generally available solutions, such as gRPC or queuing systems.

Due to the fact that the cloud infrastructure has its own component management mechanisms, some features of our transports had to be modified in order not to duplicate the functionality (e.g. we disable load balancing mechanisms, because Kubernetes has its own and they manage the load of individual pods).

So, simply closing it into a container may not be enough, as some adjustments will almost always be necessary.

It was not difficult with our solutions, because their structure is modular. However, in the case of monoliths, you have to ask yourself some basic questions:

First: can they be broken down into business domains or microservices at all?
Second (more difficult): what kind of division to apply?

In general, any code, even a monolith, can be divided into smaller modules - of course, you always have to consider the benefits and losses of such an undertaking. It may turn out that there is no point in implementing it.

After all, the inventory process, which involves a lot of analytical and architectural and infrastructural works, must NEVER be omitted. What is key - this inventory should be made in cooperation with the authors of solutions. I have already encountered very cursory analyzes by external auditors, who treated the issue of digital transformation as an opportunity for political games between suppliers, but were not substantively embedded in the context of the system operating in the organization.

The first schedule

Migration should be viewed in the context of many years. If it is a small company, it is a minimum of one year, if large - 5 years. These values are based on experience. I am currently participating in a project that has been going on for over 1.5 years and we are perhaps 1/3 of its implementation.

In large organizations, where the infrastructure consists of hundreds or thousands of modules, migration may be conceptually simple but difficult to implement. Not for architectural and technical reasons (in my opinion, these are feasible), but precisely for organizational reasons: separation of power, business criticality, i.e. the risk of interrupting the continuity of the system, resistance to change (which may be associated with the loss of influence, position in the organization).

Technological elements

This area is the easiest to define.
Regarding the sequence of work, the following factors should be considered:

1. Up-to-date software.
If the software is out of date, the question arises: should I upgrade its version during migration? Often they have to be adapted to a higher version of the operating system, which entails an upgrade to higher versions of system libraries, software libraries, etc. This can be a big undertaking!

2. Operating system. You should answer the following questions:

what operating system do we work on?
is it worth changing?
what version of the system are we working on? can we and should we upgrade the software?

3. The criticality of the component from the business point of view. You should answer the following questions:

Where are the risks associated with migration and the problems that may arise from it the greatest in terms of business?
Which business functionalities are the main and which are supporting? For example, user authorization and account access are the basic critical business areas of the banking system, and the offer module is business-relevant, but less critical.

4. Workload. When switching to the cloud, we usually don't know its performance potential. We have to check it in practice. It is better to perform such verifications on less loaded and - of course - less critical infrastructure elements. We don't know how, for example, Kubernetes will cope with the increased load - this has to be learned, and it can be a process that takes many months.

5. The level of complexity of the functionality. This is a non-trivial issue. As mentioned before, the migration process may involve a shift from a monolith-closer architecture to a microservice. For one of the clients, we are faced with a decision whether to migrate architecture from an orchestral model to an event model.

In the first phase, I advise the migration of as-is infrastructure, and in the next phases, you can approach the change of architecture to a more microservice or event model (although if the functionality is well-tuned, known, operating for years in a given architecture model, it is worth considering whether its modification in line with current technological trends actually pays off).

Regardless of the approach - the more complicated the functionality, the more nuances we can skip, and thus the more mistakes we can make during the migration.

So let's start with simple functionalities. Let's test how they work in the cloud, how to access logs, how they deal with data, how we can monitor them, what is the issue of a security audit (so important in banking infrastructure), etc.

Fortunately, every organization can answer these questions fairly quickly and plan the order in which individual systems and applications should be migrated. Any conscious manager will start with the part of the infrastructure that is the least critical and at the same time promises to be successful. However, he leaves the most difficult things for the end, which may never even migrate (this is also possible and, in my opinion, an acceptable solution).

Lots of environments

I know from experience that the most difficult period is the transition period when you need to maintain two or more versions of distribution processes. In any large organization (I am referring mainly to banks), there are many environments in the software development process. We can assume that there are usually four: development, testing, pre-production and production.

At the moment of launching the development process based on a new type of distribution (distribution to the cloud), we must maintain at least two distribution versions for some time.

Development path supported by the new model and production path supported by the current model.

Therefore (and this is one of the most important conclusions), before starting such an undertaking, we should have improved distribution processes in the CI / CD model in order to relieve administrators, testers and analysts as much as possible, so that they are able to responsibly and with a clear conscience often release subsequent versions of the software for production .

Why? Usually, an organization has optimized teams of specialists (and even teams a little too small in relation to the needs - this is a natural state). As a rule, all team members have a lot of work to do all the time - and the migration process should not multiply the number of tasks, but rather reduce repetitive tasks and allow you to focus on optimizations.

People

The entire project involves a lot of human resources from the very beginning. Therefore, in order for the migration process to be feasible, the team should be relieved by the maximum automation of processes. I am writing the maximum one on purpose, not the full one, as I approach this topic realistically. Automation is not an art in itself. It is to relieve people so that they can deal with large-scale migration tasks. If the organization already has automated distribution processes:

the team is already adjusted to the needs and there is not so much space for optimization,
you need to hire the right people, and it takes many months for these people to be familiar enough with the company's infrastructure to be really useful. My experience shows that an ambitious person, with large solutions, needs at least half a year to understand the infrastructure that will be migrated. I believe it takes a year to get to know its details, enough to be able to pick out some nuances and search for lurking traps.

Contrary to appearances, an organization that does not have implemented CI / CD processes may not migrate to the cloud longer than one that has these processes implemented. Time may be similar, or even - paradoxically - shorter for the former.

As for the aspects related to the social hierarchy in the organization, the division of powers and competences, the matter is more delicate. When creating the schedule, I always take into account such elements as the personality of team members and what they can gain or lose socially in the organization due to a given project. These elements are, in my opinion, more important than pure technical competence.

I will not give you a ready-made recipe here. I know from experience that well-established people have difficulty engaging in a project that may change that position. In such a situation, it may be better to involve people who have slightly lower technical competences, but are not afraid of changes, and who can raise their position in the organization thanks to new tasks. It is always easier to acquire technical competence than to convince experts to a project that, in their opinion, will lower the prestige of their work or even take the job away from them (in fact, it would not happen, but it is very difficult to overcome the great fear of change).

Summary

In the second part of the digital transformation cycle, I focused mainly on run-up tasks:

inventory,
potential for change,
designing the schedule,
and last but not least, about preparing people for the change that will be affected by this change.

These are preliminary tasks, after which we can have a more precise vision of the change.
In the last part, we will talk about specific works and the way of their implementation, i.e. how and with whom.

Products

All services