Again, you're getting mixed up between two different things:
The HRTF is a static transform that takes into account things that don't change dynamically (i.e. your head remains your head) and applies it to an audio stream. This can be applied to a stereo or a 5.1 (or 7.1, or however many discrete channels you want to downmix to). The modelling of sound propagation around the head is done once offline to produce the transform, and this transform is used for all real-time processing. Because it's a fixed transform it is very simple and efficient to apply, but it also only applies to the head-related effects of sound (i.e. those that allow easier discrimination of source direction).
The sound engine is what deals with the dynamic effects of the (relative to the head) moving environment on sound sources within the environment. Reflections, scattering, frequency-dependant attenuation, etc. It's what generates the stereo (or multichannel) mix that then gets fed to the HRTF. You could in theory model the head in this engine and use that in lieu of a HRTF, but you'd just be wasting processing time for no benefit (and depending on the fidelity your real-time engine is capable of, possibly even get worse results).
Putting an SSD on the card has no benefit for gaming use (and is functionally equivalent to DMA access over the PCIe bus that has been used for years). Getting textures from backing storage over the PCIe bus to the card is not a bottleneck on performance. If you were to try and keep textures out of vRAM and only load them on the fly (as opposed to the curent practice of caching every level texture onto vRAM until you run out of vRAM or run out of textures, and agressively flush those cached textures if that speace is needed for active tasks) then you would only see performance regression.
That's down to texture filtering, increasing texture resolution would only make the problem worse (nameby by introducing aliasing). MIPmaps change level based on the pixel-to-texel ratio (for many texels a pixel samples) based on absolute MIPmap size, not relative MIPmap size. If the optimum MIPmap level is 6464 for a given draw distance, the 64x64 MIPmap will be used regardless of what the MIP level 0 ('native' texture) resolution is. Thus, increasing texture resolution enables higher fidelity for closer and closer objects, but does not affect fidelity once you start stepping down MIP levels. See this image for example: anyhting at the 512x512 MIP or below would be unaffected by any increases is texture size.