As the question reads, I am building a database with 3 tables. Now these tables are going to be used to store names.
- Table 1 will store First Names
- Table 2 will store Last Names
- Table 3 will be a one to one table linking the First names to the Last names
Now all this data will be coming from a text file in the format of:
Firstname MI Lastname
I estimate that their will be over 100 million records. I also don't want to do a insert for every name it will be an update record on duplicate Key. So what is the most optimal way to enter this into my database. By the way it is a Innodb so the whole table gets locked so I can't do multiple insert updates at a time
Now this whole process will be done via C# using mysql connection and I have:
sqlRequest += "START TRANSACTION ;" +
"UPDATE firstName SET LastUpdated = CURRENT_TIMESTAMP WHERE first= '" + countSplit[0] +"' ;" +
"INSERT INTO firstName(first, LastUpdated) SELECT '" + countSplit[0] + "' AS first, CURRENT_TIMESTAMP AS LastUpdated FROM dual WHERE NOT EXISTS ( SELECT * FROM firstName d WHERE d.first= '" + countSplit[0] + "') ;" +
"COMMIT ;";
sqlRequest += "START TRANSACTION ;" +
"UPDATE lastName SET LastUpdated = CURRENT_TIMESTAMP WHERE last = '" + countSplit[2] + "' ;" +
"INSERT INTO lastName (last, LastUpdated) SELECT '" + countSplit[2] + "' AS last, CURRENT_TIMESTAMP AS LastUpdated FROM dual WHERE NOT EXISTS ( SELECT * FROM lastName d WHERE d.last = '" + countSplit[2] + "') ;" +
"COMMIT ;";
sqlRequest += "START TRANSACTION ;" +
"INSERT INTO first_to_last " +
"(firstid,lastid,LastUpdated) VALUES "+
"((SELECT firstid FROM firstName WHERE first='" + countSplit[0] + "')," +
"(SELECT lastid FROM lastName WHERE last='" + countSplit[2] + "' )," +
"CURRENT_TIMESTAMP)"+
"ON DUPLICATE KEY UPDATE LastUpdated = CURRENT_TIMESTAMP;"+
"COMMIT ;";
So do you think that this is the best way? Or do you think there is something better?