The Optimal Makeover
How Sun re-engineered its Bangalore data center to over come power and cooling issues
By S Raghotham
Sun Microsystems’s India R&D has been housed in nearly 200,000 sft. of leased space, spread over six floors, in Bangalore’s Divyashree Chambers since the late 1990s. The company has grown over the years and so has its data center located in the same building. In fact, by the beginning of 2007, the data center had come to be spread over 13 rooms, occupying a total space of nearly 11,000 sft. Naturally, since equipment has been added over the years to meet the business demands produced by fast growth, the company’s IT and data center strategies had approached a point when a host of problems had begun to appear impacting business–energy consumption and cooling, and the need for expansion to keep pace with business came to the fore. At the same time, the movement toward ‘‘greening IT’ had picked up pace and increasingly, both the government and businesses started asking their IT vendors to show their ‘green’ credentials and capabilities. Prompted by both its own internal business requirements as well as by external stimuli, Sun decided to begin ‘green’ charity at home by making its own data centers energy efficient and environmental friendly, while at the same time deriving economic and business benefit from them. It decided to carry out its project at three data centers worldwide–at Santa Clara, where it is headquartered; at Blackwater in England; and at Bangalore. With no more room to expand, in early 2007, Sun had to decide either to build another data center elsewhere or to do something to the existing one that would rationalize the equipment and reduce the amount of data center space. This would take care of service to current IT requirements and make space for future requirements. Reducing the amount of real estate occupancy meant increasing the density of IT equipments in the new data center setup, and raised power and cooling issues. They were seen as practical problems that impacted availability and robustness and also as ways to gain energy efficiency and ‘green’ credentials.
Sun put K V Ramesh, its regional workplace manager, in charge of the project. “We were facing data center expansion, power and cooling issues due to lack of space. Also, lots of new hardware was coming in as business units were expanding or new units were being formed. We had to either build another data center elsewhere or do what best we could do here. We decided to take the latter option,” Ramesh says. By reducing the data center from 13 rooms spread over several floors to four rooms on the ground floor of the building, Sun has achieved a 51 percent reduction (from 11,000 sft. to less than 5,400 sft.) in the real estate occupied by the data center. Through design choices and power and cooling solutions, it has achieved a 17 percent reduction in utility power consumption; and by rationalizing IT equipment and replacing a small percentage of legacy servers with new Sun hardware, it achieved a 154 percent increase in compute power. Besides the numbers, Sun also achieved remote connectivity to and manageability of the data center. So, how did Ramesh do it all?
The Business Requirements
Ramesh was given a set of business requirements: the ‘new’ data center would have to give Sun better availability than before, better performance than before, and it had to be ready in the shortest time period possible–96 hours. Further, since the data center catered to an R&D setup, people had to be able to connect and disconnect from machines as required. To ensure that business productivity too increased, the data center had to be designed so that people could plug into and out of any machine and they had to be able to do that remotely from their desktops or from out-of-premise locations. The data center as a whole had to be remotely manageable. To meet these daunting requirements, Ramesh carried out studies to find out how best he could meet those objectives and at the same time derive power, cooling and space benefits. He studied the feasibility of building a new data center at another location, and he ruled out the option, considering the cost and issues involved. Not only would it involve high upfront costs but also high running costs, as a remote data center would need to be connected to the main office through high speed broadband links, secured and staffed. He also studied the cost and ease of implementation of various options as well as how to orchestrate and complete the entire exercise in 96 hours. ‘‘It took a month to calculate all those numbers,’’ avers Ramesh. The Design Choices
The one consideration that was recognized from the beginning was modularity and scalability. This had to be achieved both at an overarching level as well as at the level of individual elements of the data center. At the overarching level, the data center as a whole had to be modular in design so that if at some point of time Sun decided to shift to a new location, it would be important to be able to reuse the data center equipment. As Ramesh put it, the design had to be such that the data center and the entire IT and networking infrastructure could be moved easily and rebuilt at another location in the minimum amount of time. Scalability had to be built-in in the design to accommodate future density.  At the level of individual elements, modularity was seen to be important to ensure effective and efficient use of network cabling, power, cooling as well as remote interchangeability and remote manageability of the machines. Other design considerations included UPS connectivity, rationalization of IT hardware both in terms of form factors as well as in terms of computing power, and right sizing every element of the data center so as to achieve the desired density, energy and real estate efficiencies while creating headroom for expansion. Before Ramesh embarked on his project, the data center had a mix of servers from Sun, HP, IBM, Dell, Connoi, and storage included NetApp boxes. It was also a mix of 4U, 8U and 12U systems. At the end of it, the number of servers had been reduced from 3,000 to 2,700 and they had been rationalized into 1U systems, reducing the rack footprint. By replacing just a small number of legacy servers with new Rs 8-core Sun T2000 machines, the data center gained a 154 percent increase in compute power. At the same time, the form factor rationalization enabled Ramesh and his team to reduce the space occupied by the data center by 51 percent! Next in priority was to choose the UPS system and the power connection design. After considering factors such as deployment speed, the flexibility to scale and reconfigure the system and the requirement for increased reliability while reducing costs, the team chose UPS systems from American Power Conversion (APC). Ramesh chose an n+1 design solution. Each rack was connected to two UPS machines in order to fight the four utility power outages that the data center faced on a daily average. Then came one of the most important choices that had to be made – the cooling solution. The traditional approach is to have a raised floor and cooling runs underfloor. But Ramesh decided against underfloor cooling because ‘‘it would not solve our high density requirements.’ Instead, a raised floor was built only for structural requirements. He also chose APC’s Row Cooling systems because they could be placed close to the server racks, were sensor-based and so could be monitored remotely, and were dynamically operable. A critical requirement in a data center is for the chillers to continue supplying chilled water through the pipes even during a power outage. If they do not, the temperature of the water rises. According to Ramesh, they had to consider building a 40-ton chilled water reservoir. The choice of the APC cooling unit helped avoid the reservoir as the cooling units have the ability to take a higher inlet temperature and yet perform within threshold limits. Moreover, Ramesh also connected the chillers to the UPS rather than to the utility power supply, thus ensuring that the supply of chilled water continued during a power outage. Now, how do you connect all these up and achieve modularity, which had been identified as a basic requirement? Ramesh chose a POD architecture to arrange the servers and storage, the UPS systems and the cooling system. The POD style is a group of racks or benches with a common hot or cold aisle and is used as a building block to simplify data center design for power, cooling and cabling. Further, it is vendor independent, can be on a slab or raised floor, enables flexibility and scalability and supports high density. An important element of the POD is the Intermediate Distribution Frame (IDF). ‘‘The IDF helped reduce the amount of copper used and enabled modularity by allowing data center managers to increase or decrease and move network ports between racks easily,’’ Ramesh explains. Finally, Ramesh also had to take into account rightsizing the infrastructure for the present, while allowing headroom for the future. ‘‘Sizing everything right was critical. It is easy to go wrong on either side in a project like this. The sizing was done based on actual power and cooling consumed, and after studying future product road maps.’’ Headroom was ensured by putting in oversized piping, the ability to add power modules to the existing UPS frames, through the ability to increase or decrease network ports per rack, and by grouping racks in a way that allowed space to be freed up when needed.
Doing it all in 96 hours
Ramesh attributes the success of his project to the extensive planning that was done for several weeks before the actual execution of the plan. ‘‘To sum up, we looked at what current resources we had versus what we would want in the future; the current power and cooling numbers we had versus what we wanted to see in the future’’. Parallel with planning and designing, Ramesh and his team also carried out product evaluations. In the end, they chose APC to supply the UPS and cooling solutions; Tyco to supply the physical layer; Cisco for networking gear; Rariton’s IP-based KVMs; serial console from Lantronics; RFID solution for inventory management from Fluency; and finally, Infonet network systems to do the network integration. The hardest part was, of course, the actual execution. The Sun team had to coordinate and choreograph workmen, suppliers and sundry others to complete the job flawlessly in a short period. ‘‘There are nearly 10,000 cables connected down there in the data center. Even if we took only 30 seconds to connect each cable, connecting them all alone would take 5,000 minutes.” says Ramesh. They also moved 3,000 servers from 13 rooms across floors into one room, completed the networking, and brought in and installed rows of UPS and battery systems. Since they could not get new power, they had to make do with the power connections that Sun was already using to run its dispersed data center. Ramesh and his team disconnected the power and chillers from those 13 rooms and diverted them all into the new data center room. ‘‘We conducted a 24/7 operation for four days. Planning and executing the operation required that we go down to a level of detail that enabled us to coordinate even the movement of the lifts in the building,” he says.
Benefits
At the end of it all, Ramesh had managed to reduce the number of servers from 3,000 to 2,700, increasing available computing power by 154 percent at the same time. He had reduced the real estate occupied by the data center by more than half and utility power consumption by 17 percent. Most importantly, Ramesh had built a data center that was now remotely manageable, and one in which any business unit of Sun could plug into any rack and where power and cooling parameters could be remotely monitored and metered. ‘‘We now have a data center that is not just very efficient and green but it is also based on a standard framework that makes manageability easy in terms of application provisioning and performance, which is what the business demands,” Ramesh says. ‘‘We have had only one outage, but no impact, in the nine-ten months since this project. It has been up almost 100 percent of the time,’’ he proudly adds.
|