I'm working on a Python script to migrate a MySQL database into a PostgreSQL database with a different Schema (different table structures, different datatypes and so on).
I'm a sysadmin and unfortunately I don't code very often. So I'm having some doubts about this initial programming phase.
I begin with the tables that are easy to be migrate (almost the same structure), but very soon I will have to transfer tables that need more operations to be converted for compatability.
My code actually looks like this:
#!/usr/bin/python
# Script Name: database-migration_msql-psql.py
# Description: Migrate mysql database a2
# into postgresql database a3.
# Created By: phphil.
# Date: 7 Oct 2015.
#
# ------------------------------
# Import standard libraries |
# ------------------------------
#
import os
import sys
import mysql.connector
import psycopg2
from pprint import pprint
import MySQLdb
# ------------------------------
# Import internal snippets |
# ------------------------------
#
from include.db_config import *
#from include.MySQLCursorDict import *
# ------------------------------
# Open database connections |
# ------------------------------
#
# Mysql connection
try:
cnx_msql = mysql.connector.connect( host=host_mysql, user=user_mysql, passwd=pswd_mysql, db=dbna_mysql )
except mysql.connector.Error as e:
print "MYSQL: Unable to connect!", e.msg
sys.exit(1)
# Postgresql connection
try:
cnx_psql = psycopg2.connect(conn_string_psql)
except psycopg2.Error as e:
print('PSQL: Unable to connect!\n{0}').format(e)
sys.exit(1)
# Cursors initializations
cur_msql = cnx_msql.cursor(dictionary=True)
cur_psql = cnx_psql.cursor()
# ---------------------------------
# A2.msql-table1 > A3.psql-table1 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2, field3, field4, field5 FROM msql-table1")
for row in cur_msql:
### trasformation/ conversion of mysql data OR in other cases type casting
if row['user_id'] == 0:
row['user_id'] = row['group_id']
else:
pass
try:
cur_psql.execute("INSERT INTO psql-table1 (field1, field2, field3, field4, field5) \
VALUES (%(field1)s, %(field2)s, %(field3)s, %(field4)s, %(field5)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# ---------------------------------
# A2.msql-table2 > A3.psql-table2 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2, field3, field4, field5, field6 FROM msql-table2")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO psql-table2 (field1, field2, field3, field4, field5, field6) \
VALUES (%(field1)s, %(field2)s, %(field3)s, %(field4)s, %(field5)s, %(field6)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# ---------------------------------
# A2.msql-table3 > A3.psql-table3 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2 FROM msql-table3")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO psql-table3 (field1, field2) VALUES (%(field1)s, %(field2)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# ---------------------------------
# A2.msql-table4 > A3.psql-table4 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2, field3 FROM msql-table4")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO psql-table4 (field1, field2, field3) \
VALUES (%(field1)s, %(field2)s, %(field3)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# ---------------------------------
# A2.msql-table4 > A3.psql-table4 |
# ---------------------------------
#
cur_msql.execute("SELECT l.field1, r.field2, l.field3, l.field4, l.field5, l.field6, l.field7, l.field8 \
FROM msql-table4 l, msql-table0 r \
WHERE l.field2=r.field2")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO psql-table4(field1, field2, field3, field4, field5, field6, field7, field8, field9) \
VALUES(%(field1)s, %(field2)s, %(field3)s, %(field4)s, %(field5)s, %(field6)s, %(field7)s, %(field8)s, %(field9)s, NULL, DEFAULT)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# ---------------------------------
# A2.msql-table5 > A3.psql-table5 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2, field3, field4, field5, field6, field7, field8, field9, field10 FROM msql-table5")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO psql-table5 (field1, field2, field3, field4, field5, field6, field7, field8, field9, field10) \
VALUES (%(field1)s, %(field2)s, %(field3)s, %(field4)s, %(field5)s, %(field6)s, %(field7)s, %(field8)s, %(field9)s, %(field10)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
# ---------------------------------
# A2.msql-table6 > A3.psql-table6 |
# ---------------------------------
#
cur_msql.execute("SELECT field1, field2 FROM msql-table6")
for row in cur_msql:
try:
cur_psql.execute("INSERT INTO psql-table6 (field1, field2) VALUES (%(field1)s, %(field2)s)", row)
except psycopg2.Error as e:
print "cannot execute that query!!", e.pgerror
sys.exit("Some problem occured with that query! leaving early this lollapalooza script")
################ END OF SCRIPT ################
# ---------------------------------------------
# Finalizing stuff & closing db connections |
# ---------------------------------------------
#
## Closing cursors
cur_msql.close()
cur_psql.close()
## Committing
cnx_psql.commit()
## Closing database connections
cnx_msql.close()
cnx_psql.close()
As you will notice, in each section of the script the structure is almost the same:
- Select data from a table of the source database (mysql), the result is handled by a cursor with dictionary flag (a python dictionary).
- After this, the dictionary is iterated within a for loop where possible e.g. casting fields, or adapt the table structure (see section: A2.right > A3.permission).
- And still inside the for loop, each record is inserted in the destination database.
Questions/Doubts:
Do I need to create a class in order to abstract the redundant code? Or maybe it's better to just create a function? Can someone post a short example? I have no Idea how to proceed.
In both cases I see some problems on abstracting it because the redundant code is inside a loop where I will have to do different operations depending on what table I'm iterating.I used to open and close cursors at each operation(script section), then I decided to open both cursors at the beginning of the script, and use them until the end and close them. But now I've read this and I'm confused. What's better in your opinion? One cursor for each operation, or one cursor for the whole script?