The exact benchmarking configuration depends on your physical host a lot. Generally, you can start with the number of threads equal to your 1/2 number of cores. Start with QD equal to 8 and go 16, 32 until you get maximum performance. The most interesting patterns are 4k and 8k random read/write and 64k sequential read/write as the most common for virtualized environments.