Nigel at the RupturedMonkey blog site wrote a post titled "Grrrrr big beefy manly SANs". He talks about a number of things that I think belong as part of a SAN best
practices. The post has an interesting confessional tone where he discusses a certain fascination with large and complex networks
- something that I think is common in our line of work. One of the things he discusses is limiting the number of ISLs in SANs as a
way to reduce problems: I quote one of his paragraphs below:
"One thing that had a real influence on me was some work that I did for a telco company who have a policy of no ISLs in their
SAN environments. A lot of people initially sniggle at the idea, and to be honest, I too raised an eyebrow when I first heard this. But
I have to say that in their SAN environments problems were noticeably fewer and farther between, and when they did occur they
were so much more limited in their scope and so much easier to troubleshoot."
A common mistake people make when thinking about SANs (especially iSCSI SANs) is assuming the same design principals
and best practices apply to both LANs and SANs. A simple examination of the requirements indicates profound differences: For
instance, with LANs, systems communicate frequently with other systems with the potential for any one system to communicate with
every other system, but with SANs, systems usually don't have any reason to communicate with many other systems and storage. In
fact, most SANs implement technology and processes to restrict potentially harmful communications. Put another way, LANs
assume any-to-any communications whereas SANs assume a need-to-communicate-only operating model.
An excellent strategy for providing a need-to-communicate-only environment is to reduce the number of potential connections. In
general, it is difficult to err in the pursuit of restricting SAN communications and this should be a design goal that is carried through
the initial implementation and all subsequent changes to the SAN. Pursue policies that limit the number of ports in a SAN.
The use of ISLs should be limited as a way to prevent "connection creep" in the SAN. Core-edge designs with ISLs should not
be blindly assumed as the best way to add nodes in the SAN. Instead think about using a single layer of switches that operate in
parallel providing a single-hop environment. Stackable switches such as the Cisco 3750 for iSCSI or the QLogic 5200 for Fibre
Channel have valuable scalability benefits.
Core-edge designs should be implemented with a goal of limiting the number of hops. The best core edge designs would have
two hops between systems and storage. One way to do this is to connect storage ports to core switches and servers at the
edge.
Consider using different SANs for disk I/O and backup. For instance, primary disk I/O might require dual paths with independent
switches in those paths, whereas backup might only require a single path between servers and backup devices/systems. This limits
the impact that backup and restore operations have on production disk I/O operations and solves many of the problems originating
from the conflicting connectivity models for primary disk I/O and backup (the connectivity requirements for backup tend to be much
broader than the connectivity requirements for disk access). For example, a dual-port HBA/NIC could be used to connect to the disk
SAN while a single port HBA/NIC could be used to connect to the backup SAN.