Small is Better as a SAN Best Practice
Nigel at the RupturedMonkey blog site wrote a post titled "Grrrrr big beefy manly SANs". He talks about a number of things that I think belong as part of a SAN best practices. The post has an interesting confessional tone where he discusses a certain fascination with large and complex networks - something that I think is common in our line of work. One of the things he discusses is limiting the number of ISLs in SANs as a way to reduce problems: I quote one of his paragraphs below:
"One thing that had a real influence on me was some work that I did for a telco company who have a policy of no ISLs in their SAN environments. A lot of people initially sniggle at the idea, and to be honest, I too raised an eyebrow when I first heard this. But I have to say that in their SAN environments problems were noticeably fewer and farther between, and when they did occur they were so much more limited in their scope and so much easier to troubleshoot."
A common mistake people make when thinking about SANs (especially iSCSI SANs) is assuming the same design principals and best practices apply to both LANs and SANs. A simple examination of the requirements indicates profound differences: For instance, with LANs, systems communicate frequently with other systems with the potential for any one system to communicate with every other system, but with SANs, systems usually don't have any reason to communicate with many other systems and storage. In fact, most SANs implement technology and processes to restrict potentially harmful communications. Put another way, LANs assume any-to-any communications whereas SANs assume a need-to-communicate-only operating model.
An excellent strategy for providing a need-to-communicate-only environment is to reduce the number of potential connections. In general, it is difficult to err in the pursuit of restricting SAN communications and this should be a design goal that is carried through the initial implementation and all subsequent changes to the SAN. Pursue policies that limit the number of ports in a SAN.
The use of ISLs should be limited as a way to prevent "connection creep" in the SAN. Core-edge designs with ISLs should not be blindly assumed as the best way to add nodes in the SAN. Instead think about using a single layer of switches that operate in parallel providing a single-hop environment. Stackable switches such as the Cisco 3750 for iSCSI or the QLogic 5200 for Fibre Channel have valuable scalability benefits.
Core-edge designs should be implemented with a goal of limiting the number of hops. The best core edge designs would have two hops between systems and storage. One way to do this is to connect storage ports to core switches and servers at the edge.
Consider using different SANs for disk I/O and backup. For instance, primary disk I/O might require dual paths with independent switches in those paths, whereas backup might only require a single path between servers and backup devices/systems. This limits the impact that backup and restore operations have on production disk I/O operations and solves many of the problems originating from the conflicting connectivity models for primary disk I/O and backup (the connectivity requirements for backup tend to be much broader than the connectivity requirements for disk access). For example, a dual-port HBA/NIC could be used to connect to the disk SAN while a single port HBA/NIC could be used to connect to the backup SAN.
