Stars in my Database Server – AI for Data Centers

Aabha was a playful girl just graduated to handle huge databases of an enterprise. She liked stargazing a lot besides eating Pani Puri and ice creams after her work. She is always ecstatic about her village where she would fly no sooner than every Friday evening. She had a quiet Friday that day, which was after her first month of work. She was so excited to go to the office that day, with her thoughts focused on the night that she will be stargazing on her home’s rooftop in her village. She had a great lunch with her close friend, played Table Tennis after lunch, exchanged plans during her tea with her closest mate. She was back at the desk, and Ting, there was a Severity 1 incident on a critical production database, which was running hot.

The world around the workstation was nervous, everyone around had handsfree as crowns and eyes glued to their screens swapping windows between Google, Stackoverflow, Database Expert Exchange, MSDN and copy-pasting the DMV queries on the group chat. She had to coordinate with her seniors, get the system back on to a healthy state. All her plans were paused. That moment, there was complete silence on her other thoughts, just focusing on the problem in the hand. An hour and a half ran the quickest lap as a world record in mere seconds of relative mental time. The call started easing slowly, the count of people dropping from the conference call shot up. The breather moment for her slings her thoughts onto her plans about stargazing again.

Back home at the right minute, she got herself a quick plate of the meal and launched herself to the rooftop. She had all the setup ready, the mattress, quilt, a monocular, a hot coffee to sip, on that new moon day. The stars dazzled, twinkled until the clouds came as cover. Her heart pumped in peace, and she could feel the time stall, sense the cold breeze, hear the crackling sounds of the leaves.

All the days, that took her emotions to a roller coaster ride were reeling like an action movie. She smiled at the act when she felt like she was part of an avenger team saving the planet remembering that she could bring the database server back to normal. At this moment, JARVIS reminds her of the possibilities. She starts to think, what if I could see a problem before it came.

She massages her boyfriend Arsh, a tech freak who thinks aliens are his own closest friends, and all that’s lived by humans today is a simulation that someone has already designed. He never wants to accept that’s God though.

He saw an opportunity to design something as a gift for his girl. He asked if her company already collects some metrics that can help in forecasting a server’s behaviour. She affirms that they collect the server’s performance metrics every minute. Arsh’s immediate reaction was, wow, what an easy problem, “ I now have a diamond mine.”

Arsh starts aggregating the data to simulate a similar behavior and immediately realizes that the OLTP load was creating patterns on the time axis. So, it is possible that he could use time series methods to forecast. He starts to clean the data, normalizes them by sampling averaging the data every 15m. To model the behavior appropriately, it was concluded that 3 months of data would fit. Doing some correlation and causation analysis, it was quite evident that not all of 230 metrics collected every 1m is useful. So, it was quickly shortened to 9 columns, which were crucial for the time series. After trying out some regression techniques, he was down to use Facebook’s Prophet to do time series forecasting.

The initial analysis was univariate against the time axis. However, we have a problem that’s multivariate. Facebook’s prophet gives us a way to add additional regressor components to the same model. It worked like a charm for Arsh. He was able to restructure the entire scene of the server for a few x future days. To make sure that’s appropriate, he collected the data for the next x days and started comparing for a few days. It was a startling result, mostly the prediction was spot on.

The system needed to identify future possibilities of anomalies (outliers). These anomalies are those signals that say an incident of severity can get queued in the future x days. A method called Isolation Forest worked this time so precise that, the incidents triggered exactly at the same time as predicted against PCA or KNN methods. Arsh was happy with the result.

Arsh committed code onto a GitHub repository owned by Aabha, and wrote down a Readme for her. The Monday, she was back in the office with a sense of excitement about her gift. She wanted to immediately show this to her peers. She quickly spun up a VM on the cloud, replayed the load of the problem server and tried to forecast the performance of the server for 5 days. She shot small snippets on each day that showed the comparison of the prediction and the actuals.

Her team was introduced to the magic. It was an amazing feat that wasn’t normally observed by everyone. The news swept across the floor, and past a few days, the entire building was aware of capability. Aabha was an achiever on the notice board for a quarter.

Arsh’s watch vibrates with a note from Aabha, an exciting message that pops off his mind in joy. Arsh was too glad that Aabha can gaze at the stars every Friday evening and think of him for presenting her with so many stars that twinkled, mirrored inside her retina, while she processed their names, colors, lumens.

Leave a Comment

Your email address will not be published. Required fields are marked *