One issue is that no x86 consumer CPU has, or will have, an NVLink controller built in (this will be limited to custom ARM and POWER architecture chips, unless Intel and/or AMD suddenly decide to include a proprietary controller for their competitor's HPC cards). This means that any consumer GPU will have to have both a PCIe controller AND an NVLink controller, or to add an outboard chip purely to translate between NVLink and PCIe.
An on-die controller would waste a noticeable amount of die-space that could otherwise be used for hardware that actually performs a rendering task, and thus is wasted for the vast majority of systems that use a single GPU.
An off-die translation controller would potentially impact performance, and would definitely add a significant extra expense to every card for a custom chip.
Nvidia could end up making a dedicated HPC chip with NVLink but no PCIe, putting a hard split between their consumer dies and HPC dies. Or they could eat the loss of die area due to the move to 14nm process giving some headroom, and use NVLink as a binning tool, disabling it for all (or all but very high end) consumer cards.
Either way, PCIe has not yet been a significant bottleneck to multi-GPU performance. Until DX12 and Vulcan become more ubiquitous and PCIe bus loads increase with the draw-call bottleneck removed, NVLink probably won't be an important factor for multi-GPU gaming. I also haven't heard much from Nvidia recently about Unified Memory coming with Pascal. It;s still on their slides, but it got pushed back from Maxwell to Pascal pretty quietly so I wouldn't be surprised to see it pushed back from Pascal to Volta too.