
The "secret" to the future of gaming lies in the fact that bandwidth means little if one is unable to move that much data fast enough through the bus. This is comparable to other technologies such as UltraDMA-33, 66, 100, and 133, as well as SATA 1.5Mb/sec (150-183KB/sec) and SATA II 3.0Mb/sec (300-366KB/sec). Or even network speeds 10Mb/sec, 100Mb.sec, and 1Gb/sec. The bandwidth may be large, but the bandwidth may only be reached a fractional minuscule of the time data is being moved across it. Every single SATA II drive out there right now (except may be SSD drives) wouldn't be able to move data any faster than if it were regular SATA 1.5Mb connection. Of course burst transfers may peak past this, but it really doesn't affect overall performance.
So, with that being said, PCI-Express 2.0 will suffer from the same limitations in the near future. Someday it will be useful and GPUs will utilize some of that extra bandwidth, but for now, it's just future-proofing. This is only indirectly related to quad SLI/Crossfire, because having more bandwidth and additional GPUs doesn't change that all of the video data traffic still has to go through the PCI-Express bus. If you saturate the bus (only xx number of PCI-Express lanes are available), additional graphical processing power won't help much.
This is why PCs still have bottlenecks. The front side bus is a killer for Intel CPUs, because it's a serious factor for bottlenecks, including the Qxxxx quad-core processors from Intel. If we we weren't limited by the FSB, our rigs would scream with speed! This goes back to the old comparison of latency vs. speed. Always put latency as a higher priority than speed, because latency is your biggest bottleneck that will make even the fastest rigs appear slow for mundane tasks. For a hypothetical example, if data takes 2 pico-seconds to reach a CPU, but the bandwidth can support 128 of these bits of data at once, it still takes 2 pico-seconds for the data to arrive, regardless of how many pieces of data are traveling at one time (up to 128, with additional data taking extra clock cycles).
So if a relatively small chunk of data needs to be processed, it will take the same amount of time to move around a 1,333MHz bus on a 3.6GHz
CPU as it would a 2.4GHz CPU.
This is only one of the many reasons why there are diminishing performance gains to multiple CPU cores, multiple GPUs, and multiple video cards. On to trend forecasting...
If you're interested, you might want to check out the history of the PCI bus and how it overtook the VESA local bus addition to ISA. It has
a very similar tech evolution when comparing PCI-Express and PCI/AGP. Look specifically into the reasons for moving to PCI, and compare it
with what actually happened in the market and why ISA was still used for a long time after for simple expansion cards, such as sound and
I/O.
For quad SLI/Crossfire to be successful, the market switch from PCI-Express to PCI is moving far to slowly. Manufacturers may try to market and push for this configuration, but it will not be nearly as profitable as it could be with our current limitations.
Imagine the possibilities in the future:
If you have a motherboard with all PCI-Express 16x slots (ie: 6 to 8 of them), both SLI and crossfire become a thing of the past provided
drivers mature to handle all multiple GPU configurations across the bus only (possible with the increased bandwidth of the PCI-Express 2.0
standard. Imagine being able to place any number of video cards along this bus and have them talk to each other, designate one as a "master",
and output rendering frames to a monitor utilizing the processing power of multiple GPUs. Imagine if it didn't matter what board you have, as
it would perform this functionality on video cards with any manufacturer's GPU.
That could be the future of high-end gaming, not SLI or Crossfire per say.