just thinking out loud: what about a really simple 'terminator-like' device and a driver? the terminator comes instead of a real pcie device.
e.g. the terminator could be something looping back data on each lane (or across lanes, etc). it must be an active component because it needs to negotiate the pcie version, number of lanes used, TLP length, etc). passive loopback would be too easy
so a pcie device with an fpga would do it. the problem is that they are really expensive so killing the whole idea of 'cheap' testing. altera and xilinx might have some nice boards.
unfortunately a cheap(er) x1 device is not an option because most likely we can not say that if one lane has a good quality then all the lanes have and regarding signal integrity this is absolutely an invalid test for multi-lane risers.
Also, having a NIC with enough bandwidth is really expensive, too. a 2x100G NIC doing a loopback would be pretty close to what I am thinking about. (either having a loopback cable or configured for doing loopback on chip level). the problem is not only the expensive NIC but also the CPU required for generating and capturing this amount of data. trafgen could be something like
https://trex-tgn.cisco.com/ or
http://dpdk.org/browse/apps/pktgen-dpdk/refs/
another option could be to add this feature to an existing videocard driver and write a tool like the Furmark. Or using Furmark itself if it can generate really high traffic with some settings on pcie. Intel's PCM (performance count monitor) or Microsoft's Xperf might give some information about the occupancy of the lanes but I am unaware of any counters like retransmit or so.
sorry for flushing all these things here.
I think the benchmarks done e.g. with GPUs are usually insufficient because of the used bandwidth. but it would be nice to have our own quality measurement. unfortunately all the things above are only for the given environment and do not show how the riser would behave in an other environment...