Software enhancements and modern monitoring solutions for a large-scale, high-availability distributed control systems at CERN

My mandate as a site reliability engineer and senior software engineer at CERN is to maintain C++ real-time software for a large-scale, distributed control systems for 2500+ power converters running 24/7 operations. In the article I describe software enhancements and monitoring solutions I have implemented, as well as their impact on the CERN accelerators operation team and on-call service.