Your database model is finally finished and ready to support your client's new system. All entities, individual fields, and relationships are there. You feel they all make perfect sense, and you accounted for all major problems you could foresee. And now you have to test it.
But your database is empty.
How do you test a database without data? You can't. Of course, developers already entered several records with all text data looking something like 'asdf' and all the number being like '1234'. That doesn't help at all.
You need better data.
So how do you go about populating your test database? Keying in the data manually is out of the question. It's not only tedious work, but also time consuming and prone to mistakes. And of course, you need more data. Enough to see how the database behaves with decent data volumes. If production data exists, maybe you could simply copy it. Not the complete database of course, but enough to be able to meaningfully test new database.
But existing data isn't exactly ready.
You must develop all the data mappings. And many trivial and not so trivial data transformations. Truth, you probably have already done part of the work to migrate parts of critical data from the old system. And now you realize you would have to do almost the same for the entire database. Maybe you remember recent work of developing the data migration procedures and scripts. If so, you shudder at the thought. No way you will voluntarily go in that particular nightmare again.
And what about data confidentiality?
Surely you don't want to be responsible for thousands of credit card numbers. Or any other highly sensitive data, for that matter. And overall protection of personal information is a critical issue nowadays. In fact you realize you must take compliance with data protection regulations into account. If you would be able to use data from live database, you would need to develop scripts to scramble all the sensitive data. At least.
Better develop your scripts to simply generate random data.
But you need the data that looks real. How would you recognize a problem in a thousand record sheet if all the data looked like 'hTr4Qg81jP'? You need lists of names and surnames. And a way to generate half decent addresses, e-mails and phone numbers. Lot of work, however you look at it. And you must solve the foreign key relationships. The idea pains you, it should have been the first thing that comes to mind.
Vivid picture finally visualizes in your mind.
You will write all the scripts. It will take you days. You will make sure all the formats are correct, and all foreign key relationships resolved. Of course, all foreign keys must be ensured to have actual values in primary tables. You will determine precise order in which to execute the scripts. Once done, the real problems begin. You will be running scripts and monitoring results. When a script fails, you will find the problem and fix it. And run the scripts again. If you are lucky you will have to re-run only part of them, and without having to empty the database first.
And what will you do in two weeks when database model changes?
You discard the whole idea. So if copying live data is a bad idea and generating data yourself means countless hours of tedious work, what remains? Use a specialized test data generator tool.
So if copying production data is a bad idea and generating data yourself means countless hours of tedious work, what remains?
Use a specialized test data generator tool.