Creating Sample Data

Programming, for all ages and all languages.
Post Reply
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Creating Sample Data

Post by AJ »

Hi All,

As well as OS dev, I'm also creating a large database-based Windows app in c#. A big part of this system is contact management.

Does anyone know of any programs which generate a large amount of (sensible) random data (Forename, Surname, Title, Address etc...) which I can use to test my system on a larger scale (I' talking about 10,000-20,000 records). I can import data from most database (and formatted text file) types.

If such a progam was biased towards producing UK-type postal codes it would be useful.

Cheers,
Adam
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

Tried the telephone dictionary? :wink:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

Combuster wrote:Tried the telephone dictionary? :wink:
:roll:

I didn't mention - I'm not a professional developer so have no funds to buy *real* data or employ an office monkey, and don't have time to input 20k names from a paper directory! I'm aware of online directories where I can search for individual names for free, but not where I can download that quantity of data.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

Well, either you're doing it professionally (and honestly, a phone directory on CD-ROM doesn't cost the world), or you're whipping up a script that takes some 20 fornames, 20 surnames, 20 postal codes etc. etc. and combines them at random...
Every good solution is obvious once you've found it.
User avatar
mystran
Member
Member
Posts: 670
Joined: Thu Mar 08, 2007 11:08 am

Post by mystran »

AJ wrote: I didn't mention - I'm not a professional developer so have no funds to buy *real* data or employ an office monkey, and don't have time to input 20k names from a paper directory! I'm aware of online directories where I can search for individual names for free, but not where I can download that quantity of data.
Actually, I guess this is the type of thing you could solve without really testing, if you do a little analysis. In any sane implementation, the bottleneck will be the database, and unless you are really into database design, you should probably be using some off-the-shelf SQL database. Then your performance is pretty dependant on two things: how many queries you need to do, and can those queries be optimized by the database using indexes.

You don't need 20k tuples in your database to figure such things out. You need around 20. Then look at the amount of queries you do, and ask your DBMS to explain (often this is indeed the command "EXPLAIN") how it does those queries. Alternatively, you could just feed some 20 sensible entries, and then generate (with a small program in whatever scripting language) any number of not-so-sensible entries, which just happen to have the right format. This is a good strategy if your DBMS is too intelligent to use index when tables are small enough that it's faster to just read through them.

edit: oh and if you're unable to offload most of the work to the database engine, consider a redesign. ;)
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
Post Reply