• 2015-10-29 21:15:00

    Setting up AWS RDS MySQL replication offsite

    Preface

    Recently I worked on setting up replication between our AWS RDS instance and a server running as a MySQL slave in our location. Although the task was not difficult, there are quite a few areas that one needs to pay attention to.

    In this blog post, I am going to outline the process step by step, so that others can benefit and not lose time trying to discover what went wrong.

    Disclaimer: Google, DuckDuckGo, the AWS forums and this blog post have been invaluable guides to help me do what I needed to do.

    Setup

    • One RDS MySQL or Aurora instance (Master)
    • One server running MySQL in your premises (or wherever you want to put it) (Slave)
    • Appropriate access to IP of the Slave on the master.

    Master Setup

    There is little to do on our master (RDS). Depending on the database size and update frequency, we will need to set up the maximum retention time for the bin logs. For a very large database we need to set a high number, so that we are able to export the database from the master, import it in the slave and start replication.

    Connect to your database and run the following command:

    MySQL [(none)]> call mysql.rds_set_configuration('binlog retention hours', 24);
    

    You can use a different number of hours; I am using 24 for this example.

    Slave Setup

    I am assuming that MySQL is installed on the machine that has been designated as the slave, and also that that machine has ample space for the actual data as well as the binary logs that will be created for the replication.

    Edit my.cnf

    The location of this file is usually under /etc or /etc/mysql. Depending on your distribution it might be located elsewhere.

    [mysqld]
    ...
    
    #bind-address = 0.0.0.0
    
    # Logging and Replication
    general_log_file  = /logs/mysql.log
    general_log       = 1
    log_error         = /logs/mysql_safe.log
    log_slow_queries  = /logs/mysql-slow.log
    long_query_time   = 2
    slave-skip-errors = 1062
    log-queries-not-using-indexes
    
    server-id         = 1234567
    log_bin           = /logs/mysql-bin.log
    expire_logs_days  = 2
    max_binlog_size   = 100M
    

    Note: The configuration file will contain a lot more entries but the ones above are the ones you need to pay attention to.

    • bind-address: We need to comment this line so that we can connect to the instance from somewhere else in the network. Keep this line if you are going to work only on the slave machine and allow no connections from elsewhere.
    • general_log_file: The location of your query log file. You can disable this (see next entry) but it is always good to keep it on at least at the start, to ensure that replication is moving smoothly. Tailing that log will give you a nice indicator of the activity in your database.
    • general_log: Enable or disable the general log
    • log_error: Where to store the errors log
    • log_slow_queries: Where to store the slow queries log. Especially helpful in identifying bottlenecks in your application
    • long_query_time: Time to specify what a slow query is
    • slave-skip-errors: 1062 is the "1062 | Error 'Duplicate entry 'xyz' for key 1' on query. Default database: 'db'. Query: 'INSERT INTO ...'" error. Helpful especially when the replication starts.
    • log-queries-not-using-indexes: We want this because it can help identifying potential bottlenecks in the application
    • server-id: A unique ID for your slave instance.
    • log_bin: Where the binary replication logs are kept
    • expire_logs_days: How long to keep the replication logs for
    • max_binlog_size: Maximum replication log size (per file)

    Once you set these up, restart your MySQL instance

    /etc/init.d/mysql restart
    
    Download the SSH Public Key for RDS

    In your slave server, navigate to /etc/mysql and download the rds-combined-ca-bundle.pem file. This file will be used by the slave to ensure that all the replication traffic is done using SSL and nobody can eavesdrop on your data in transit.

    cd /etc/mysql
    wget http://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
    

    NOTE You can put the rds-combined-ca-bundle.pem anywhere on your slave. If you change the path, you will have to modify the command to connect the slave to the master (shown further below) to specify the exact location of the key.

    Import timezone data

    This step might not be necessary depending on your MySQL installation. However since RDS works with UTC, you might find your replication breaking because your slave MySQL instance cannot understand the UTC timezone. The shell command you need to run on your slave machine to fix this is:

    mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql
    
    Creating the RDS related tables

    RDS uses its own tables to keep track of the replication status and other related data such as the replication heartbeat, configuration etc. Those tables need to be present in the mysql database of your slave in order for the replication to work.

    DROP TABLE IF EXISTS `rds_configuration`;
    CREATE TABLE `rds_configuration` (
      `name` varchar(100) NOT NULL,
      `value` varchar(100) DEFAULT NULL,
      `description` varchar(300) NOT NULL,
      PRIMARY KEY (`name`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    DROP TABLE IF EXISTS `rds_global_status_history`;
    CREATE TABLE `rds_global_status_history` (
      `collection_end` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
      `collection_start` timestamp NULL DEFAULT NULL,
      `variable_name` varchar(64) NOT NULL,
      `variable_value` varchar(1024) NOT NULL,
      `variable_delta` int(20) NOT NULL,
      PRIMARY KEY (`collection_end`,`variable_name`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    DROP TABLE IF EXISTS `rds_global_status_history_old`;
    CREATE TABLE `rds_global_status_history_old` (
      `collection_end` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
      `collection_start` timestamp NULL DEFAULT NULL,
      `variable_name` varchar(64) NOT NULL,
      `variable_value` varchar(1024) NOT NULL,
      `variable_delta` int(20) NOT NULL,
      PRIMARY KEY (`collection_end`,`variable_name`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    DROP TABLE IF EXISTS `rds_heartbeat2`;
    CREATE TABLE `rds_heartbeat2` (
      `id` int(11) NOT NULL,
      `value` bigint(20) DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    DROP TABLE IF EXISTS `rds_history`;
    CREATE TABLE `rds_history` (
      `action_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `called_by_user` varchar(50) NOT NULL,
      `action` varchar(20) NOT NULL,
      `mysql_version` varchar(50) NOT NULL,
      `master_host` varchar(255) DEFAULT NULL,
      `master_port` int(11) DEFAULT NULL,
      `master_user` varchar(16) DEFAULT NULL,
      `master_log_file` varchar(50) DEFAULT NULL,
      `master_log_pos` mediumtext,
      `master_ssl` tinyint(1) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    DROP TABLE IF EXISTS `rds_replication_status`;
    CREATE TABLE `rds_replication_status` (
      `action_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `called_by_user` varchar(50) NOT NULL,
      `action` varchar(20) NOT NULL,
      `mysql_version` varchar(50) NOT NULL,
      `master_host` varchar(255) DEFAULT NULL,
      `master_port` int(11) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    DROP TABLE IF EXISTS `rds_sysinfo`;
    CREATE TABLE `rds_sysinfo` (
      `name` varchar(25) DEFAULT NULL,
      `value` varchar(50) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    

    NOTE I am not 100% that all of these tables are needed. I have seen only the rds_heartbeat2 and rds_replication_status used. You can experiment with these when you enable replication and add each table in turn if needed. You can confirm whether the above are correct for your instance by connecting to the master and taking a mysqldump of the mysql database.

    Replication

    Replication user

    We need to create a user in our master database that will have the appropriate rights to perform all the replication related actions. We need these commands to be run on the master. For this example I am creating a user called rpluser with the password 424242:

    MySQL [(none)]> CREATE USER 'rpluser'@'%' IDENTIFIED BY '424242';
    MySQL [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'rpluser'@'%';
    
    Master Status

    Connect to your master and issue this command:

    MySQL [(none)]> show master status;
    

    The output will be something like this:

    +----------------------------+----------+--------------+------------------+-------------------+
    | File                       | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +----------------------------+----------+--------------+------------------+-------------------+
    | mysql-bin-changelog.000123 |   171819 |              |                  |                   |
    +----------------------------+----------+--------------+------------------+-------------------+
    

    Keep those values handy (File and Position) since we will use them to instruct the slave where to start requesting data from the master (binlog file and position).

    mysqldump

    Take a database dump of all the databases in RDS (exclude information_schema and mysql). If your database can afford a bit of downtime you can use the --opt flag in mysqldump, which will lock all tables until the backup completes. If not, you can use the --skip-add-locks flag. More information about mysqldump options can be found here

    mysqldump --host=192.168.1.2 --user='root' --password my_db > /backups/my_db.sql
    

    Adjust the above command to fit your needs. Once all databases have been dumped, we need to import them in the slave.

    Importing data in the slave

    Navigate to the folder you have all the *.sql dump files, connect to the slave database and start sourcing them.

    cd /backups
    mysql --host=192.168.1.2 --user='root' --password
    MySQL [(none)]> create database my_db;
    MySQL [(none)]> use my_db;
    MySQL [my_db]> source my_db.sql;
    

    Repeat the process of creating the database, using it and sourcing the dump file until all your databases have been imported.

    NOTE There are other ways of doing the above, piping the results directly to the database or even using RDS to get the data straight from it without a mysqldump. Whichever way you choose is up to you. In my experience, the direct import worked for a bit until our database grew to a point that it was timing out or breaking while importing, so I opted for the multi step approach. Have a look at this section in the AWS RDS documentation for more options.

    Connecting to the master

    Once your restore has been completed it is time to connect our slave to the master. In order to connect to the master from the slave we need to verify the following:

    • The name of the RDS instance (for the command below I will use myinstance.rds.amazonaws.com)
    • The name of the replication user (we chose rpluser)
    • The password of the replication user (we chose 424242)
    • The master log file (see above, we got mysql-bin-changelog.000123 from show master status;)
    • The master log file position (see above, we got 171819)
    • The location of the SSL certificate (we used /etc/mysql/rds-combined-ca-bundle.pem)

    The command we need to run on the slave MySQL server is (newlines added for readability):

    MySQL [(none)]> CHANGE MASTER TO 
        -> MASTER_HOST='myinstance.rds.amazonaws.com', 
        -> MASTER_USER='rpluser', 
        -> MASTER_PASSWORD='424242', 
        -> MASTER_LOG_FILE='mysql-bin-changelog.000123', 
        -> MASTER_LOG_POS=171819, 
        -> MASTER_SSL=1, 
        -> MASTER_SSL_CERT='', 
        -> MASTER_SSL_CA='/etc/mysql/rds-combined-ca-bundle.pem', 
        -> MASTER_SSL_KEY='';
    
    Starting the replication

    All we have to do now is to start the slave:

    MySQL [(none)]> START SLAVE;
    

    We can check if everything is OK either by using the general log (see my.cnf section) by tailing it from the shell:

    tail -f /logs/mysql.log
    

    or by issuing this command on the mysql prompt:

    MySQL [(none)]> SHOW SLAVE STATUS \G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: myinstance.rds.amazonaws.com
                      Master_User: rpluser
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin-changelog.000123
              Read_Master_Log_Pos: 171819
                   Relay_Log_File: mysqld-relay-bin.000002
                    Relay_Log_Pos: 123
    ...
                       Last_Errno: 0
                       Last_Error: 
    ...
               Master_SSL_Allowed: Yes
               Master_SSL_CA_File: /etc/mysql/rds-combined-ca-bundle.pem
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
    ...
          Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
    ...
    

    Congratulations on your working RDS slave (offsite) machine :)

    Conclusion

    This blog post is by no means exhausting all the topics that replication can cover. For additional information please see the references below.

    I hope you find this post helpful :)

    References

  • 2013-09-15 12:00:00

    Let the RDBMS do more than just store data

    One of the common "mistakes" that programmers (and have been guilty as charged many a times in the past) is not to use the tools that are available to them to the maximum extent possible.

    A common example is using the RDBMS of your choice to only store and retrieve data, without taking advantage of its power and its features to the full extent.

    A RDBMS can do much, much more. One can use triggers that can auto update fields (as I will demonstrate in this blog post), log data into tables, trigger cascade deletes etc.; stored procedures can compute complex data sets, joining tables and transforming data; views can offer easier representations of data, hiding complex queries from the actual application. In addition, such features, like stored procedures/views, can offer security enhancements as well as maintainability to an application. Execution for instance can be restricted to particular groups/logins, while changing the stored procedure/view only requires a change on the database layer and not the application itself.

    In this blog post I will show you a simple example on how one can transfer some of the processing of an application to the RDBMS. I am using MariaDB as the RDBMS and PhalconPHP as the PHP framework.

    The RDBMS

    Each table of my database has several common fields that are used for logging and reporting as well as recording status.

    An example table is as follows

    CREATE TABLE IF NOT EXISTS co_address (
      id             int(11)      unsigned NOT NULL AUTO_INCREMENT,
      address_line_1 varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
      address_line_2 varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
      address_line_3 varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
      region         varchar(6)   COLLATE utf8_unicode_ci DEFAULT NULL,
      post_code      varchar(24)  COLLATE utf8_unicode_ci DEFAULT NULL,
      country        varchar(2)   COLLATE utf8_unicode_ci DEFAULT NULL,
      created_id     int(11)      unsigned NOT NULL DEFAULT '0',
      created_date   datetime              NOT NULL,
      updated_id     int(11)      unsigned NOT NULL DEFAULT '0',
      updated_date   datetime              NOT NULL,
      deleted        tinyint(1)   unsigned NOT NULL DEFAULT '0',
      PRIMARY KEY (id),
      KEY created_id (created_id),
      KEY created_date (created_date),
      KEY updated_id (updated_id),
      KEY updated_date (updated_date),
      KEY deleted (deleted)
    ) ENGINE=InnoDB  
    DEFAULT CHARSET=utf8
    COLLATE=utf8_unicode_ci
    COMMENT='Holds addresses for various entities' AUTO_INCREMENT=1 ;
    

    The fields are:

    Field Name Description
    created_id The id of the user that created the record
    created_date The date/time that the record was created
    updated_id The id of the user that last updated the record
    updated_date The date/time that the record was last updated
    deleted A soft delete flag

    There is not much I can do with the user ids (created/updated) or the deleted column (see also notes below regarding this). However as far as the dates are concerned I can definitely let MariaDB handle those updates.

    Triggers

    The work is delegated to triggers, attached to each table.

    --
    -- Triggers address
    --
    DROP TRIGGER IF EXISTS trg_created_date;
    DELIMITER //
    CREATE TRIGGER trg_created_date BEFORE INSERT ON address
     FOR EACH ROW SET NEW.created_date = NOW(), NEW.updated_date = NOW()
    //
    DELIMITER ;
    DROP TRIGGER IF EXISTS trg_updated_date;
    DELIMITER //
    CREATE TRIGGER trg_updated_date BEFORE UPDATE ON address
     FOR EACH ROW SET NEW.updated_date = NOW()
    //
    DELIMITER ;
    

    The triggers above update the created_date and updated_date fields automatically upon insert/update.

    Phalcon Model

    I needed to make some changes to my model Address, in order to allow the triggers to work without interference from the model.

    class Model extends PhModel
    {
        public function initialize()
        {
            // Disable literals
            $this->setup(['phqlLiterals' => false]);
    
            // We skip these since they are handled by the RDBMS
            $this->skipAttributes(
                [
                    'created_date',
                    'updated_date',
                ]
            );
    
        }
    
        public function getSource()
        {
            return 'address';
        }
    
        public function getCreatedDate()
        {
            return $this->created_date;
        }
    
        public function getUpdatedDate()
        {
            return $this->updated_date;
        }
    }
    

    By using skipAttributes, I am instructing the Phalcon model not to update those fields. By doing so, I am letting my triggers worry about that data.

    Conclusion

    It might seem a very trivial task that I am delegating but in the grand scheme of things, the models of an application can be very complex and have a lot of logic in them (and so might controllers). Delegating some of that logic in the RDBMS simplifies things and also increases performance of the application, which now requires just a bit less computational power.

    NOTES

    For a soft delete feature i.e. automatically updating the deleted field when a DELETE is called, a trigger will not work. Instead one can use a stored procedure for it. See this Stack Overflow answer.

  • 2011-12-09 12:00:00

    Change the encoding of a MySQL database to UTF8

    Overview

    As applications grow, so do their audiences. In this day and age, one cannot assume that all the consumers of a web based application will live in a particular region and use only one language. Even if the developer assumes that one country will be served by the particular web application, there are instances that the latin1 character set will not suffice in storing data.

    Therefore, developers and database designers need to implement an encoding on their database that will safely store and retrieve any kind of data, not only latin1 based (i.e. the English alphabet).

    For MySQL this encoding is utf8_general_ci.

    The problem

    MySQL usually comes with the latin1_swedish_ci encoding as a default. This encoding will allow the developer to store data of course but when non latin1 characters need to be stored, there will be a problem. Effectively latin1 encoding will store data in 8 bits but some languages like Japanese, Thai, Arabic, even French or German have special characters that need more space in the storage engine. Trying to store a 16 bit character in a 8 bit space will fail all the time.

    Latin1 based database:
    Input: abcdef...ABCD...#$
    Output: abcdef...ABCD...#$

    Input: 日本語 ภาษาไทย Ελληνικά
    Output: ??? ??????? ????????

    To combat this, all you have to do is change the encoding of your database to utf8_general_ci and the character set to utf8.

    The solution

    I wrote a script in PHP to analyze a database server and produce ALTER statements to be executed against your database(s). The script needs to run from a web server that supports PHP.

    First of all, the encoding of the database will change with the relevant SQL statement. Following that each table's encoding will change, again with the relevant SQL statement. Finally, each TEXT/VARCHAR/CHAR etc. field's encoding will change towards the target encoding you specify in the configuration section (see below).

    The safest way to transform data this way is to first change the field to a BINARY field and then change the field to the target encoding and collation.

    Configuration

    There are a few configuration variables that need to be set prior to running the script.

    $db_user       = 'username';
    $db_password   = 'password';
    $db_host       = 'hostname';
    $output_folder = '/home/ndimopoulos'; // Do not include trailing slash
    $db_name       = 'mysql';             // Leave this one as is
    
    set_time_limit(0);
    
    /**
     * The old collation (what needs to be changed)
     */
    $from_encoding = 'latin1_swedish_ci';
    
    /**
     * The new collation (what we will change it to)
     */
    $to_encoding = 'utf8_general_ci';
    
    /**
     * The new character set
     */
    $new_collation = 'utf8';
    
    /**
     * Add USE <database>; before each statement?
     */
    $use_database = TRUE;
    

    The $output_folder is a folder that is writeable from your web server and it is where the .sql files will be created filled with the ALTER statements. The script will output one file <hostname>.sql which will contain all the ALTER statements for all databases. It will also create files for individual databases <hostname>.<database>.sql. You can use either the big file or the individual database files. The choice is yours.

    The $from_encoding is what the script will check. In my script I was checking latin1_swedish_ci.
    The $to_encoding is what we need the encoding to be while the $new_collation is the new character set.

    The $use_database is a flag that will allow you to generate statements such as:

    USE <database>; 
    ALTER TABLE <table>.....
    

    if it is on, and if off, the statement will be:

    ALTER TABLE <table>.....
    

    Databases loop

    The script opens a connection to the server and runs the SHOW DATABASES command. Based on the result returned, it populates an array with the database names.

    The script ignores two databases information_schema and mysql, but editing the $exclude_databases array will allow you to ignore more databases if you need to.

    mysql_connect($db_host, $db_user, $db_password);
    mysql_select_db($db_name);
    
    $dbs = array();
    
    exclude_databases = array('mysql', 'information_schema',);
    
    /**
     * Get the databases available (ignore information_schema and mysql)
     */
    $result = mysql_query("SHOW DATABASES");
    
    while ($row = mysql_fetch_row($result))
    {
        if (!in_array($row[0], $exclude_databases))
        {
            $dbs[] = $row[0];
        }
    }
    

    The database names are stored in an array, so as not to keep the database resource active all the time. Had I not done that, I would have had to use three different resources (one for the database, one for the table and one for the field being checked - three nested loops).

    Tables loop

    The script then loops through the $dbs array and selects each database in turn. Once the database is selected, the SHOW TABLES query is run and a $tables array is populated with the names of the tables in that database. At the same time the ALTER DATABASE statements are being generated.

    mysql_select_db($db);
    
    $db_output = '';
    
    $statement  = "\r\n#-------------------------------------------------\r\n\r\n";
    $statement .= "USE $db;\r\n";
    $statement .= "\r\n#-------------------------------------------------\r\n\r\n";
    $statement .= "ALTER DATABASE $db "
               . "CHARACTER SET $new_collation COLLATE $to_encoding;\r\n";
    $statement .= "\r\n#-------------------------------------------------\r\n\r\n";
    
    $db_output .= $statement;
    $output    .= $statement;
    
    $tables     = array();
    
    $result = mysql_query("SHOW TABLES");
    
    while ($row = mysql_fetch_row($result))
    {
        if (!in_array($row[0], $exclude_tables))
        {
            $tables[] = mysql_real_escape_string($row[0]);
        }
    }
    

    Fields loop

    The script then loops through the $tables array and runs the SHOW FIELDS query so as to analyze each field.

    $fields_modify = array();
    $fields_change = array();
    
    $result = mysql_query("SHOW FULL FIELDS FROM `$table`");
    while ($row = mysql_fetch_assoc($result)) 
    {
        if ($row['Collation'] != $from_encoding)
        {
            continue;
        }
    
        // Is the field allowed to be null?
        $nullable = ($row['Null'] == 'YES') ? ' NULL ' : ' NOT NULL';
    
        if ($row['Default'] == 'NULL') 
        {
           $default = " DEFAULT NULL";
        } 
        else if ($row['Default']!='') 
        {
           $default = " DEFAULT '" . mysql_real_escape_string($row['Default']) . "'";
        } 
        else 
        {
           $default = '';
        }
    
        // Alter field collation:
        $field_name = mysql_real_escape_string($row['Field']);
    
        $fields_modify[] = "MODIFY `$field_name` $row[Type] CHARACTER SET BINARY";
        $fields_change[] = "CHANGE `$field_name` `$field_name` $row[Type] "
                         . "CHARACTER SET $new_collation "
                         . "COLLATE $to_encoding $nullable $default";
    }
    

    The two arrays generated ($fields_modify and $fields_change contain the MODIFY and CHANGE statements of each field. Using implode, we can easily construct the ALTER statement.

    $statement .= "ALTER TABLE `$table` " 
                . implode(' , ', $fields_modify) . "; \r\n";
    $statement .= "ALTER TABLE `$table` " 
                . implode(' , ', $fields_change) . "; \r\n";
    

    Notes

    You can use as mentioned earlier the $exclude_databases array to not allow certain databases to be processed. You can also use the $exclude_tables array to not allow certain tables to be processed.

    The $exclude_tables_fields array allows you to exclude a field from being processed. However this is not tied to a database so any database/table that has a field with that particular name will not be processed. With a bit of refactoring you can make the script best work for your needs.

    If you set the $use_database variable to TRUE then each line in your .sql statements will be prefixed with a 'USE ;' statement. This is to help the accompanying bash script to execute each statement in the respective database. If you intend on not running this process one statement at a time, you can set this to FALSE. You can then run each database .sql file (or the one that contains all of the statements from all databases) as one single command.

    Server processing

    Now that the relevant .sql files have been created, all you have to do is upload them on your web server. There are three ways of actually running the statements against the database.

    Please make sure you backup your data first!

    Single file processing

    mysql -h<host> -u<username> -p<password> < /path/to/scripts/<host>.sql
    

    The command above will run all the commands generated in the <host>.sql file for all databases. It is going to be taxing for your database server and there is no error handling or reporting. You can always pipe the results to an output file (just append "> /path/to/output/output.txt" at the end of the command). If this method fails for some reason (MySQL has gone away), it will be difficult to resume; you will need to edit the .sql file to remove the statements that have already been processed.

    Per database

    mysql -h<host> -u<username> -p<password> <database_name> \
           </path/to/scripts/<host>.<database>.sql
    

    The command above will run all the commands generated in the ..sql file for that particular database. This method is similar to the one above. Again you can always pipe the results to an output file (just append "> /path/to/output/output.txt" at the end of the command).

    Single file processing (per statement)

    /path/to/scripts/process.sh
    

    All you need to do is edit the process.sh script and change the relevant parameters to match your environment and upload it to your server. The source file that the process.sh script will read has to be generated with $use_database set to TRUE.

    The process.sh script is:

    #!/bin/bash
    
    DBUSER=root
    DBPASS=1234
    DBHOST=localhost
    SOURCE="/home/ndimopoulos/host.sql"
    LOG="/home/ndimopoulos/conversion.log"
    
    while read line
    do
        TIMENOW=`date +%Y-%m-%d-%H-%M`
        echo START $TIMENOW $line
        echo START $TIMENOW $line >> $LOG
        /usr/bin/time -f "%E real,%U user,%S sys" -v -o $LOG -a \
            mysql -h$DBHOST -u$DBUSER -p$DBPASS -e "$line"
    
        TIMENOW=`date +%Y-%m-%d-%H-%M`
        echo END $TIMENOW 
        echo END $TIMENOW >> $LOG
    
    done < $SOURCE
    
    exit 0
    

    The script above will start reading the source file (.sql) and execute each statement in turn, using time) to measure the time taken to execute that command. The output ends up in a log file which can easily be tailed to view the progress and used later on for analysis. The results of the processing are also sent to the screen. You can change the parameters for the time command to match your needs.

    The output will look something like the block below:

    START 2011-12-08-23-46 USE mydatabase; \
         ALTER TABLE `tablename` DEFAULT CHARACTER SET utf8;
     Command being timed: "mysql -uroot -p1234 -e USE mydatabase; \
         ALTER TABLE `tablename` DEFAULT CHARACTER SET utf8;"
     User time (seconds): 0.01
     System time (seconds): 0.00
     Percent of CPU this job got: 0%
     Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.16
     Average shared text size (kbytes): 0
     Average unshared data size (kbytes): 0
     Average stack size (kbytes): 0
     Average total size (kbytes): 0
     Maximum resident set size (kbytes): 8192
     Average resident set size (kbytes): 0
     Major (requiring I/O) page faults: 0
     Minor (reclaiming a frame) page faults: 610
     Voluntary context switches: 11
     Involuntary context switches: 5
     Swaps: 0
     File system inputs: 0
     File system outputs: 0
     Socket messages sent: 0
     Socket messages received: 0
     Signals delivered: 0
     Page size (bytes): 4096
     Exit status: 0
    END 2011-12-08-23-46
    

    Conclusion

    In order for a database to be best prepared to support localization, you need to make sure that the storage will accept any possible character. You can start by creating all your tables and fields with utf8_general_ci encoding, but for existing databases and data, you will need to run expensive processing queries on your RDBMS. Ensuring that the data does not get corrupted when performing the transformation process is essential so make sure you backup your databases before trying or running the output statements produced by the db_alter.php script.

    PHP script (db_alter.php)

    $db_user       = 'username';
    $db_password   = 'password';
    $db_host       = 'hostname';
    $output_folder = '/home/ndimopoulos'; // Do not include trailing slash
    $db_name       = 'mysql'; // Leave this one as is
    
    set_time_limit(0);
    
    /**
     * The old collation (what needs to be changed)
     */
    $from_encoding = 'latin1_swedish_ci';
    
    /**
     * The new collation (what we will change it to)
     */
    $to_encoding = 'utf8_general_ci';
    
    /**
     * The new character set
     */
    $new_collation = 'utf8';
    
    /**
     * Add USE <database> before each statement?
     */
    $use_database = TRUE;
    
    mysql_connect($db_host, $db_user, $db_password);
    mysql_select_db($db_name);
    
    $dbs = array();
    
    $exclude_databases     = array('mysql', 'information_schema',);
    $exclude_tables        = array('logs', 'logs_archived',);
    $exclude_tables_fields = array('activities');
    
    /**
     * Get the databases available (ignore information_schema and mysql)
     */
    $result = mysql_query("SHOW DATABASES");
    
    while ($row = mysql_fetch_row($result)) 
    {
        if (!in_array($row[0], $exclude_databases))
        {
            $dbs[] = $row[0];
        }
    }
    
    $output = '';
    /**
     * Now select each db and start parsing the tables
     */
    foreach ($dbs as $db)
    {
        mysql_select_db($db);
        $db_output = '';
    
        $statement  = "\r\n#----------------------------------------\r\n\r\n";
        $statement .= "USE $db;\r\n";
        $statement .= "\r\n#----------------------------------------\r\n\r\n";
        $statement .= "ALTER DATABASE $db "
                   . "CHARACTER SET $new_collation COLLATE $to_encoding;\r\n";
        $statement .= "\r\n#----------------------------------------\r\n\r\n";
    
        $db_output .= $statement;
        $output    .= $statement;
        $tables     = array();
    
        $result = mysql_query("SHOW TABLES");
    
        while ($row = mysql_fetch_row($result))
        {
            if (!in_array($row[0], $exclude_tables))
            {
                $tables[] = mysql_real_escape_string($row[0]);
            }
        }
    
        /**
         * Alter statements for the tables
         */
        foreach ($tables as $table)
        {
            $statement = '';
            if ($use_database)
            {
                $statement  = "USE $db; ";
            }
            $statement .= "ALTER TABLE `$table` "
                       . "DEFAULT CHARACTER SET $new_collation;\r\n";
            $db_output .= $statement;
            $output    .= $db_output;
        }
        $statement .= "\r\n#----------------------------------------\r\n\r\n";
    
        $db_output .= $statement;
        $output    .= $statement;
    
        /**
         * Get the fields for each table
         */
        foreach ($tables as $table)
        {
            if (in_array($table, $exclude_tables_fields))
            {
                continue;
            } 
    
            $fields_modify = array();
            $fields_change = array();
    
            $result = mysql_query("SHOW FULL FIELDS FROM `$table`");
            while ($row = mysql_fetch_assoc($result)) 
            {
                if ($row['Collation'] != $from_encoding)
                {
                    continue;
                }
    
                // Is the field allowed to be null?
                $nullable = ($row['Null'] == 'YES') ? ' NULL ' : ' NOT NULL';
    
                if ($row['Default'] == 'NULL') 
                {
                    $default = " DEFAULT NULL";
                } 
                else if ($row['Default']!='') 
                {
                    $default = " DEFAULT '"
                             . mysql_real_escape_string($row['Default']) . "'";
                }
                else 
                {
                    $default = '';
                }
    
                // Alter field collation:
                $field_name = mysql_real_escape_string($row['Field']);
    
                $fields_modify[] = "MODIFY `$field_name` $row['Type'] "
                                 . "CHARACTER SET BINARY";
                $fields_change[] = "CHANGE `$field_name` `$field_name` $row['Type'] "
                                 . "CHARACTER SET $new_collation "
                                 . "COLLATE $to_encoding $nullable $default";
            }
    
            if (count($fields_modify) > 0)
            {
                $statement = '';
                if ($use_database)
                {
                    $statement = "USE $db; ";
                }
                $statement .= "ALTER TABLE `$table` "
                            . implode(' , ', $fields_modify) . "; \r\n";
                if ($use_database)
                {
                    $statement = "USE $db; ";
                }
                $statement .= "ALTER TABLE `$table` "
                            . implode(' , ', $fields_change) . "; \r\n";
    
                $db_output .= $statement;
                $output    .= $statement;
            }
        }
    
        $bytes = file_put_contents(
            $output_folder . '/' . $db_host . '.' . $db . '.sql', $db_output
        );
    }
    
    $bytes = file_put_contents($output_folder . '/' . $db_host . '.sql', $output);
    
    echo "<pre>$db_host $bytes \r\n$output</pre>";
    

    Downloads

    You can use these scripts at your own risk. Also feel free to distribute them freely - a mention would be nice. Both scripts can be found in my GitHub.

  • 2010-02-02 12:00:00

    Design Patterns - Factory

    A note about these series. It appears that Giorgio Sironi and I had the same idea regarding Design Patterns and blogging about them. He covers the Factory design pattern thoroughly in his blog post, which is recommended reading.

    The Problem

    I started off my IT career as a network administrator. This was back in the good old Novell 3.11 days. After that it was Novell 4.0, Microsoft Servers etc. Following that I got more and more involved with Visual Basic and when Microsoft decided to move everyone to .NET I chose not to follow and ended up coding in PHP.

    Since my programming knowledge came from within (studying, reading articles, trial and error), the problems that I was facing on a daily basis are the same as almost every developer faces. One particularly challenging problem that I had in the VB days as well as the PHP days was repetition of code and how to eliminate it.

    When I started programming for the Ferrari Fans Fun Forecast site I was running the site using my apartment's ADSL line. In the beginning there were only 20 users or so, therefore that setup was fine. The scripts were VBScript against a local instance of Microsoft SQL Server. Later on though, I switched to PHP while keeping Microsoft SQL Server.

    Initial implementation

    I knew that I would have to change my scripts later on to work against MySQL since I was to change the hosting of the site. Through laziness or poor design (you can pick either or both :)) I chose to create a class for Microsoft SQL and later on I would just change it to the MySQL. It seemed the easiest thing to do at the time.

    So my class was something like:

     class DbMSSQL
    {
        private $_conn = null;
    
        public function connect()
        {
            // Connect
            // This is where we have the connection parameters
            include_once 'connection.inc.php';
    
            $this->_conn = mssql_connect($host, $user, $password);
            if (!$this->_conn) {
                throw new Exception('Cannot connect to the database.');
            }
        }
    
        public function disconnect()
        {
             // Disconnect
            mssql_close($this->_conn);
        }
    
        public function selectdb()
        {
             // Select the db
            $db = mssql_select_db($database, $this->_conn);
            if (!$db) {
                throw new Exception('Cannot select the database');
            }
        }
    
        public function query($sql)
        {
         // Query the db
            $_result = mssql_query($sql);
    
            if (!$_result) {
                throw new Exception('Error in query : ' . $sql);
            }
    
            $data = array();
    
            while ($row = mssql_fetch_assoc($_result)) {
                $data[$row['id']] = $row;
            }
    
            mssql_free_result($_result);
    
            return $data;
        }
    }
    

    Everything worked fine so I did not worry about a thing. A few months later though I was forced to move the database (as I expected) to MySQL. I could not move everything in one go so I had to move some of the tables initially and a week later everything else.

    To tackle this requirement I created a second class to handle operations against MySQL. The class that I ended up with was:

    class DbMySQL
    {
        private $_conn = null;
    
        public function connect()
        {
            // Connect
            // This is where we have the connection parameters
            include_once 'connection.inc.php';
    
            $this->_conn = mysql_connect($host, $user, $password);
            if (!$this->_conn) {
                throw new Exception('Cannot connect to the database :' . mysql_error());
            }
        }
    
        public function disconnect()
        {
             // Disconnect
            mysql_close($this->_conn);
        }
    
        public function selectdb()
        {
             // Select the db
            $db = mysql_select_db($database, $this->_conn);
            if (!$db) {
                throw new Exception('Cannot select the database : ' . mysql_error());
            }
        }
    
        public function query($sql)
        {
         // Query the db
            $_result = mysql_query($sql);
    
            if (!$_result) {
                throw new Exception('Error in query : ' . mysql_error() . "\n" . $sql);
            }
    
            $data = array();
    
            while ($row = mysql_fetch_assoc($_result)) {
                $data[$row['id']] = $row;
            }
    
            mysql_free_result($_result);
    
            return $data;
        }
    }
    

    You can easily see the problem here. There is a lot of repetition in the code, not so much as the actual method properties but the methods themselves. Both classes have connect(), disconnect(), selectdb() and query() as methods. In reality the code changes only slightly since the call for an operation against Microsoft SQL Server is mssql_* while for MySQL is mysql_*. During the transition week I was in programming hell. At some point I mixed the class names, I was trying to read and update the wrong server etc. (see Jani Hartikainen's post about 6 programming project mistakes you should avoid - I did all that!).

    That week though taught me that I need to pay more attention in designing rather than going full speed ahead with programming and later on paying the consequences.

    After thorough research, I discovered a library that would support both platforms. The library that I found was ADOdb which is a perfect example of the Factory Pattern. I used that library later on for a different project, but just looking at the code and understanding the flow of operations as well as the implementation of the pattern itself was invaluable to me.

    Interfaces

    First of all I need to explain what an interface is and why we need them. According to PHP.net:

    Object interfaces allow you to create code which specifies which methods a class must implement, without having to define how these methods are handled.

    So imagine an interface something like a graft, a blueprint on what I need to construct. The Interface will have the common methods and properties that I need to implement.

    When dealing with database connections to two different database servers (Microsoft SQL Server and MySQL), I can clearly define a few methods that will follow CRUD (Create, Read, Update, Delete). Those are:

    • Connect to the database server
    • Select database
    • Insert record
    • Delete record
    • Update record
    • Select record(s)
    • Close connection to the database server

    My interface would therefore be:

    interface iDatabase
    {
        public function connect();
        public function disconnect();
        public function selectdb();
        public function query($sql);
    }
    

    Design Patterns - Factory

    A class implementing the Factory Pattern is like a car manufacturing plant producing three different cars on the same assembly line. All cars have common characteristics like 4 wheels, 4 doors (well most of them), a steering wheel, a dashboard etc. and all of them perform certain operations i.e. drive, reverse etc.

    In my problem earlier I could have used the Factory Pattern to create one class that would have implemented my blueprint, the interface which defines the CRUD operations that I need. So based on the above, the implementation will result in three classes.

    MSSQL class - stored in the file Db_mssql.php
    class Db_mssql implements iDatabase
    {
        private $_conn = null;
    
        public function connect()
        {
            // Connect
            // This is where we have the connection parameters
            include_once 'connection.inc.php';
    
            $this->_conn = mssql_connect($host, $user, $password);
            if (!$this->_conn) {
                throw new Exception('Cannot connect to the database.');
            }
        }
    
        public function disconnect()
        {
             // Disconnect
            mssql_close($this->_conn);
        }
    
        public function selectdb()
        {
             // Select the db
            $db = mssql_select_db($database, $this->_conn);
            if (!$db) {
                throw new Exception('Cannot select the database');
            }
        }
    
        public function query($sql)
        {
         // Query the db
            $_result = mssql_query($sql);
    
            if (!$_result) {
                throw new Exception('Error in query : ' . $sql);
            }
    
            $data = array();
    
            while ($row = mssql_fetch_assoc($_result)) {
                $data[$row['id']] = $row;
            }
    
            mssql_free_result($_result);
    
            return $data;
        }
    }
    
    MySQL class - stored in the file Db_mysql.php
    class Db_mysql implements iDatabase
    {
        private $_conn = null;
    
        public function connect()
        {
            // Connect
            // This is where we have the connection parameters
            include_once 'connection.inc.php';
    
            $this->_conn = mysql_connect($host, $user, $password);
            if (!$this->_conn) {
                throw new Exception('Cannot connect to the database :' . mysql_error());
            }
        }
    
        public function disconnect()
        {
             // Disconnect
            mysql_close($this->_conn);
        }
    
        public function selectdb()
        {
             // Select the db
            $db = mysql_select_db($database, $this->_conn);
            if (!$db) {
                throw new Exception('Cannot select the database : ' . mysql_error());
            }
        }
    
        public function query($sql)
        {
         // Query the db
            $_result = mysql_query($sql);
    
            if (!$_result) {
                throw new Exception('Error in query : ' . mysql_error() . "\n" . $sql);
            }
    
            $data = array();
    
            while ($row = mysql_fetch_assoc($_result)) {
                $data[$row['id']] = $row;
            }
    
            mysql_free_result($_result);
    
            return $data;
        }
    }
    

    Notice that both these classes are almost identical to the initial implementation shown earlier in this post. The only difference is that they are both implementing the iDatabase interface.

    So what is different now? The class that implements the Factory Pattern.

    class Db
    {
        public static function factory($type)
        {
            $fileName = 'Db_' . strtolower($type) . '.php';
            if (!file_exists($fileName)) {
                throw new Exception('File not found : ' . $fileName);
            }
    
            $className = 'Db_' . strtolower($type);
    
            return new $className;
        }
    }
    

    What this class does now is it allows me to load the relevant database connection class on the fly. If I want a Microsoft SQL connection I would call:

        $mssql = Db::factory('mssql');
    

    while for MySQL the command becomes:

        $mysql = Db::factory('mysql');
    

    Again since both underlying classes implement the iDatabase interface, I know exactly what to expect as far as methods and functionality is concerned from each class.

    Conclusion

    The Factory Design Pattern is one of the most powerful design patterns. It provides 'decoupling' i.e. breaks the inherited dependency of a class and its subclasses. It also allows for great flexibility while keeping the same interface for your clients.

    Zend Framework uses the Factory Pattern in Zend_Db. Specifically the example on the site shows:

    // We don't need the following statement because the
    // Zend_Db_Adapter_Pdo_Mysql file will be loaded for us by
    // the Zend_Db factory method.
    
    // require_once 'Zend/Db/Adapter/Pdo/Mysql.php';
    
    // Automatically load class Zend_Db_Adapter_Pdo_Mysql
    // and create an instance of it.
    $db = Zend_Db::factory(
        'Pdo_Mysql', 
        array(
            'host'     => '127.0.0.1',
            'username' => 'webuser',
            'password' => 'xxxxxxxx',
            'dbname'   => 'test'
        )
    );
    

    The Zend_Db factory accepts the name of the adapter used for the database connection as the first parameter while the second parameter is an array with connection specific information. With the use of the Factory Pattern, Zend_Db exposes a common interface which allows programmers to connect to a number of databases using the same methods. Should in the future the application needs to access a different database, the impact to the developer is minimal - in most cases a change to the adapter name (first parameter of the factory class) is all it takes.

  • 2010-01-22 12:00:00

    Design Patterns - Singleton

  • 2009-11-03 12:00:00

    Flexible storage in MySQL

    We all need data to function. Whether this is information regarding what our body craves at the moment - hence go to the local take-away and get it or cook it - or whether this is electronic data to make our tasks easier, makes no difference.

    Storing data in an electronic format is always a challenge. When faced with a new project you always try to out think the project's needs and ensure that you covered all the possible angles of it. Some projects though are plain vanilla since say you only need to enter the customer's name, surname, address and phone. But what happens when you need to enter data that you don't know their type?

    This is where flexible storage comes into play. You can develop a database system that will store efficiently data (well within reason) without knowing what the data will be.

    Say we need to build an application that will be given to the customer to store data about his contacts, without knowing what the fields the customer needs. Fair enough storing the name, surname, address, phone etc. of the customer are pretty much easy and expected to be features. However what about a customer that needs to store in his/her contacts the operating system the contact uses on their computer? How about storing the contact's favorite food recipe, their car mileage, etc. Information is so diverse that you can predict up to a point what is needed but after that you just face chaos. Of course if the application we are building is intended for one customer then everything is simpler. How about more than one customers are our target audience? For sure we cannot fill the database with fields that will definitely be empty for certain customers.

    A simple format to store information can be achieved by storing a type and a value. The first field (data_type) will be a numeric one to hold the ID of the field while the second field (data_value) will be of TEXT type for the "value". The reason for the TEXT is because we don't know the size of the data that will be stored there. Indexes on both fields can help with speeding up the searches. If you use MySQL 4+ you can for sure opt for the FULLTEXT indexing method than the one used in previous MySQL versions.

    We also need a second table to hold the list of our data types (data_type field). This table will have 2 columns and will have of course an ID (AUTOINCREMENT INT) and a VARCHAR column to hold the description of the field.

    CREATE TABLE data_types (
        type_id MEDIUMINT( 8 ) UNSIGNED NOT NULL AUTO_INCREMENT,
        type_name VARCHAR( 50 ) NOT NULL,
        PRIMARY KEY ( type_id )
    );
    

    The table to store the data in will be as follows:

    CREATE TABLE data_store (
        cust_id MEDIUMINT( 8 ) UNSIGNED NOT NULL ,
        type_id MEDIUMINT( 8 ) UNSIGNED NOT NULL ,
        field_data TEXT NOT NULL,
        PRIMARY_KEY ( cust_id, type_id )
    );
    

    And also creating another index:

    ALTER TABLE data_store ADD FULLTEXT (field_data);
    

    (Note that the FULLTEXT support is a feature of MySQL version 4+)

    So what does this table do for us. We need to store the information of Mr. John Doe, 123 Somestreet Drive, VA, USA, +1 (000) 12345678 who likes cats and has a Ford Mustang.

    We first add the necessary fields we need to store in our data_types table. These fields for our example are as follows:

    1 - Title

    2 - Country

    3 - Favorite animal

    4 - Car

    The numbers in front are the IDs that I got when entering the data in the table.

    Assuming that the customer has a unique id of 1, we are off to store the data in our table. In essence we will be adding 4 records into the data_store table for every contact we have. The cust_id field holds the unique ID for each customer so that we can match the information to a single contact as a block.

    INSERT INTO data_store
        (cust_id, type_id, field_data)
        VALUES
        ('1', '1', 'Mr.'),
        ('1', '2', 'USA'),
        ('1', '3', 'Cat'),
        ('1', '4', 'Ford Mustang');
    

    That's it. Now Mr. John Doe is in our database.

    Adding a new field will be as easy as adding a new record in our data_types table. Now with a bit of clever PHP you can read the data_types table and display the data from the data store field.

    We can use the above example to store customer data either as a whole or as a supplement. So for instance in our example we can start by storing the customer ID, first name, surname etc. as fields also in the data_store table using a specific data type. On a different angle we can just keep the core data in a separate table (storing the first name, surname, address etc.) and linking that table with the data_store one.

    This approach although very flexible it has its disadvantages. The first one is that each record has a TEXT field to store data in. This will be a huge overkill for data types that are meant to store boolean values or integers. Another big disadvantage is the search through the table. It is TEXT but also it is vertically structured in blocks. So if you need to search for everyone living in the USA you will need to first find the data_type representing the Country field and then match it to the field_data field of the data_store table.

    There is no one right way of doing something in programming. It all depends on the circumstances and of course to the demands of the application we are developing.

    This is just another way of storing data.