I’ve always been a big proponent of using the best test data possible. As a developer, I find that it’s very easy to get lost in the details of implementation, and tend to leave the generation of test data till a later stage of a project.
The problem I find is that by leaving the generation of test data till the end, more often than not, I end up only testing edge cases (which is extremely important), but I tend to not spend enough time generating large quantities of normal data. A result of this is that it’s easy to miss performance problems.
So, as of today, I’m making sure I have sufficient amounts of test data up front. Best of all, I’m focusing on automated data population from various sources. One of my favourite sources for this is Wikipedia. For any large body of text, I find just grabbing a random article is perfect.