The main function of IT operations is to keep IT services running smoothly and efficiently. It would be nirvana if IT infrastructure services just work perfectly throughout their lifespan after they are initially installed and configured. However, the reality is that hardware fails, bugs are found, features need to be added, security flaws are found, patches need to be applied, usage fluctuates, data needs to be protected, upgrades need to be done, demand increases, etc. The following are the important job of IT operations:
Monitoring is the only way to keep track of the health and availability of the systems. Monitoring can be accomplished by looking at the system’s health via dashboard or console, or via specialized monitoring appliances. One major component of monitoring is alerting via email or pagers when there is a major issue. Monitoring can also come from incident tickets generated by machines or end users that may not be apparent via machine monitoring. System logs can also be used for trending and monitoring as it can bring into light some flaws on the system.
Once issues are detected, IT operations should be able to troubleshoot these issues and fix them as soon as possible to bring the service back online. Issues that are more complicated to fix should be escalated to vendors, higher level support, or engineers and developers.
IT operations should not make any changes (such as configuration change, hardware replacement, upgrades, etc) without following the proper change control procedure. More than 50% of outages are caused by changes on the system. IT services are often tightly integrated with other system and a change on one system may be able to affect the others. Subject matter experts of the various systems should make sure that a change will not affect their system. Planning and testing are vital steps in performing changes.
IT operations should monitor and trend the utilization of resources (compute, storage, network) and allocate resources to ensure that there is enough capacity to serve demand. They should be able to predict and allocate resources so that there is capacity when they are needed.
IT operations should optimize IT services and ensure efficient use of resources. The goal is provide an excellent user experience for these services. Mechanisms such as redundancy, local load balancing, global load balancing and caching improves utilization, efficiency and end user experience.
In addition to keeping the IT services running smoothly, IT operations should also protect these systems and their data by backing them up and replicating them to a remote site. The goal is to bring these systems back online in as little time as possible when there is a catastrophic failure on the systems.
IT operations should also be responsible for securing the systems. Due to its enormous task, a lot of companies employ a dedicated Security Operations Center (SOC) that watches security breaches.
One of the goals of IT operations is to automate most manual activities via scripting and self-healing mechanism. This enables them to focus on higher value tasks and not get bogged down on repetitive tasks.