Indeed, this is, in our opinion the only valid way of testing. Its time consuming (do the maths and you will see that in this single assessment, our guys had to perform just under 12,000 individual tests!) but you get results that map most closely to real world use cases.
Also, we think including the time to detect missed samples is an interesting metric - as you don't want these things running on your endpoint for any great length of time - particularly financial malware.
We really strive to make testing accurate because if you cant measure a products efficacy accurately and appropriately, how can we help the vendors actually improve their products performance in a way that benefits their clients?
We have been involved in recent industry discussions and it emerges that some vendors spend literally millions $ in engineering time, altering their product so that it performs better in certain types of tests (I'm thinking here of tests which measure the increase in system boot time and steady state load etc) - whilst these tests do have a value in some sense, some vendors feel this money and resources would be better spent in other areas which actually helped improve protection for clients.
The consequence of the above is that vendors can see testing as a "cost" rather than an "investment" - in our view this is why we have always tried to present ourselves as an efficacy assessment and assurance house as opposed to simply a testing lab.
The majority of our work is private external quality assessment services - to serve as an independent and on-going benchmark, helping vendors ensure their product is running as they would like. We also are the worlds largest supplier of malicious URLs and binaries (300-500 thousand unique binaries each day and around 250 thousand URL each day) and supply these to the majority of vendors to help them protect their customers better. We also supply our feed to some testing labs - for instance, if you look at the tests our friend Neil Rubenking conducts at PC Mag, you will see he uses our feed.
In any case, we are always open to discussion and welcome constructive feedback.
Cheers,
Chris.