>The best thing i see coming from vulkan and dx12 are the increased drawcalls and low cpu overhead
Async compute is nearly entirely irrelevant for this. The important part of DX12/Vulkan for increasing draw calls and lowering CPU overhead are the multi-threaded command lists, which Nvidia has already supported since DX11 (hence their lower CPU overhead in DX11 compared to AMD). Leveraging DCLs, you can offload tasks to async compute too, but the BIG benefit is from the multi-threaded rendering itself. Basically, DX12/Vulkan brings AMD to Nvidia's level in the overhead department.
>But nvidia's method seems like it would be inefficient on it
Like everything, it depends. Nvidia designed Maxwell (and Pascal's) software scheduler around lowering frametimes as much as possible by dynamically distributing workload across the GPCs/SMs. Nvidia's software contingent is leagues larger than AMD's, and they have spent a LOT of money to make their underlying algorithms and drivers speedy. Their choice was to dynamically distribute load using the software scheduler, and it's very fast at it. AMD's choice was to dynamically distribute load using a hardware scheduler, and when
all the parts of the GPU can be used (AMD couldn't in DX11) it is very fast at it.
People that don't understand will say things like "hardware is ALWAYS faster" or "software is EMULATING" (implying it's bad), but great software can overcome bad hardware (I'm NOT saying AMD's hardware scheduler is bad!), and vice-versa.
As an example, look at the raw specs of the R9 390 vs the GTX 970.