Code Reading Notes - MySQL Server Partitioning

MySQL provides a general interface of storage engine handlers. A variety of implementations are built into MySQL, including MyISAM, InnoDB and Partitioning. Most of storage engines do not support partitioning natively [ref]. Therefore Mikael Ronström implemented a partitioning storage engine on top of existing engines since MySQL 5.1. Since there isn't too much info about the implementation, I would like to share some observations with you.

Note that there is a great slide on partitioning by the author, see [ref].

MySQL Version: 5.5 from Twitter

The Big Picture of MySQL

When the client sends a command to mysqld, it will parse it in do_command/dispatch_command/mysql_parse, and then mysql_execute_command. The actual command is handled by a storage engine handler, defined in some source file starting with ha_ prefix.

All handlers share a common interface defined in /sql/handler.h. It will handle create table, for instance, in handler::create(). Note here that handlers need to be initialized. They are stateful and may contain handler specific files.

The partition handler ha_partition.h is such a handler that only receives incoming requests and dispatch the actual storage staff down to underlying engines like InnoDB inside some involving partitions. The partition handler itself doesn't do too much except for maintaining the metadata of underlying engines in all partitions and other works including optimization.

A Trace of create table

Here, we take the following SQL statement as an example, and examine its trace. You can read [ref] at the same time.

create table employee  
    (id int auto_increment, 
     name varchar (50), 
     primary key (id)) 
engine = innodb  
partition by hash (id)  
partitions 4;  

This will create a table with 4 partitions, partitioned using hash on column id. There are other partitioning options available.

The query will then be processed here

>do_command
| >dispatch_command: COM_QUERY
| | >mysql_parse
| | | >mysql_execute_command: SQLCOM_CREATE_TABLE

Receiving, Parsing and Dispathing

do_command is called inside a loop in /sql/conn_handler/connection_handler_per_thread.cc. This is the server loop for one client. It loops until the connection is closed. The implementation of do_command is in /sql/sql_parse.cc

Then right before it returns, dispatch_command is called. Inside it is a big switch upon different input command. In our case, the command goes to COM_QUERY branch.

Then mysql_parse is called. It parses the query and mysql_execute_command. Here it is again a big switch upon the parserd sql query. In our case, it is SQLCOM_CREATE_TABLE

>mysql_execute_command: SQLCOM_CREATE_TABLE
| >create_table_precheck
| | >check_access
| | >check_grant
| >mysql_create_table

Info for Creating a Table

Here it mostly uses the info provided by create_info from the parsing result to create the table. This struct come from st_ha_create_information in /sql/handler.h, and an important member of it is db_type pointer of type handlerton defined in the same header file.

Handlerton and Plugin Initialization

handlerton is a singleton structure (per storage engine) to provide access to storage engine functionality that works on the global level (unlike handler class that works on a per-table basis) [ref]. This is initialized during plugin initialization phase when mysqld boots.

>plugin_initialize
| >ha_initialize_handlerton
| | plugin: initialize plugin: 'partition'
| | > partition_initialize

This will allocate the handlerton and call the plugin initialization routine. In our case, the plugin definition is located at the bottom of /sql/ha_partition.cc using macro mysql_declarer_plugin (/include/mysql/plugin.h). It basically defines a plugin structure st_mysql_plugin (/include/mysql/plugin.h) and fills the init function pointer with an implementation partition_initialize (/sql/ha_partition.cc).

Therefore partition_initialize is called with a pre-allocated handlerton as argument. This initialization further fills handlerton function pointers create with a concrete partition_create_handler. This will be the function that allocates table level handler later.

Partition Info

After updating create_info with more info from parsing, we see partition related operations - to handle partition info from thd->lex->part_info from parsing. partition_info is a class defined in /sql/partition_info.h, containing information for partition handler like a list of partition_element, fields used for partition, partition id, corresponding table, partition type, etc,.

Creating the Table

Then, we enter mysql_create_table in /sql/sql_table.cc. This is just a wrapper, it opens and locks table metadata, and further calls into mysql_create_table_no_lock.

>mysql_create_table
| >open_and_lock_tables
| | >open_tables
| | | >open_and_process_table
| | | | >open_table
| | | | | >check_if_table_exists
| | | | | | >build_table_filename: db:'testdb1'  table_name:'employee'  ext:'.frm'
| | | | | | | >tablename_to_filename
| | | | | | | | >check_if_legal_tablename
| | | | | | >ha_check_if_table_exists
| | >lock_tables
| >mysql_create_table_no_lock

Creating the InnoDB Handler

In mysql_create_table_no_lock, it will get the real storage engine handler and fill in those metadata. But first, it needs partition info.

>mysql_create_table_no_lock
| >check_engine
| >build_table_filename
| >load_db_opt
| >get_new_handler
| | info: db_type = InnoDB create_info->db_type = InnoDB

This is done through the get_new_handler call. This is defined in /sql/handler.cc. It uses the storage engine level handlerton(via create_info->db_type) to create a table level handler. Here, the get_new_handler call first creates a InnoDB handler because for the partitioned table, the underlying engine is InnoDB.

Checking the Partition Info

>mysql_create_table_no_lock
| ...
| >partition_info::check_partition_info
| | info: default table_engine = InnoDB
| | >partition_info::fix_parser_data
| | >partition_info::set_up_defaults_for_partitioning
| | >partition_info::set_up_default_partitions
| | | >create_default_partition_names
| | info: Using table_engine = InnoDB
| | >partition_info::has_unique_names
| | | >partition_info::has_unique_name
| | | >partition_info::has_unique_name
| | | >partition_info::has_unique_name
| | | >partition_info::has_unique_name
| | info: part = 0 engine = InnoDB
| | info: part = 1 engine = InnoDB
| | info: part = 2 engine = InnoDB
| | info: part = 3 engine = InnoDB
| | >partition_info::check_engine_mix
| | | info: in: engine_type = InnoDB, table_engine_set = 1
| | | info: part = 0 engine = InnoDB table_engine_set 1
| | | >check_engine_condition
| | | >check_engine_condition
| | | >check_engine_condition
| | | >check_engine_condition
| | | info: engine_type = InnoDB
| | | info: out: engine_type = InnoDB
| >generate_partition_syntax
| info: db_type: InnoDB

Then, the partitioning staff kicks in. It reads the part_info from thd->work_part_info, and uses create_info to fill info into it, particularly underlying engine type (here it is InnoDB). The paritition_info::check_partition_info will check that the partition info requested is possible to set-up in this version. You can see the trace info saying that there are four partitions with underlying engine being InnoDB. This is exactly how we specified in the query.

Replacing InnoDB Handler with Partition Handler

Then something interesting happens here in mysql_create_table_no_lock.

 if ((!(engine_type->partition_flags 
      && engine_type->partition_flags() & HA_CAN_PARTITION)) 
    || create_info->db_type == partition_hton) {
    /*
        The handler assigned to the table cannot handle partitioning.
        Assign the partition handler as the handler of the table.
    */
    delete file;
    create_info->db_type = partition_hton;

Note this, "the handler assigned to the table cannot handle partitioning. Assign the partition handler as the handler of the table".

Previously, we create the variable file of type handler by calling get_new_handler and setup InnoDB handler. Now we delete it and ready to use partition handler. Thus it calls get_ha_partition with part_info.

>get_ha_partition
| >ha_partition::ha_partition(part_info)
| >ha_partition::initialize_partition
| | >ha_partition::new_handlers_from_part_info
| | | >get_new_handler
| | | >get_new_handler
| | | >get_new_handler
| | | >get_new_handler
| >ha_partition::table_flags

get_ha_partition is defined in /sql/handler.cc. It allocates a new handler ha_partition with part_info and initializes it by calling initialize_partition. As you can see from the trace, this is where partition handler allocates underlying engine handlers: there are four get_new_handler for the four InnoDB partitions.

Creating Table and Partition Definition Files

It then performs some preperaing, checking and other stuff. Then it reaches rea_create_table (/sql/unireg.cc) which is responsible for creating .frm file. And then create_handler_files creates the .par file.

>rea_create_table
| >mysql_create_frm
| >ha_partition::create_handler_files()
| | >create_handler_files()
| >ha_create_table
| | >init_tmp_table_share
| | >open_table_def
| | | >open_binary_frm
| | | | >get_form_pos
| | | | info: default_part_db_type = 12
| | | | info: extra segment size is 66 bytes
| | | | info: setting dbtype to 'partition' (20)
| | | | info: Found format section
| | | | info: format_section_length: 11, format_section_flags: 0
| | | | info: tablespace: '<null>'
| | | | >get_new_handler
| | | | | >ha_partition::ha_partition(table)
| | | | | >ha_partition::initialize_partition
| | | | | | >ha_partition::get_from_handler_file
| | | | | | | >ha_partition::read_par_file
| | | | | | | >ha_partition::setup_engine_array
| | | | | | | | >create_handlers
| | | | | | | | | >get_new_handler
| | | | | | | | | >get_new_handler
| | | | | | | | | >get_new_handler
| | | | | | | | | >get_new_handler
| | | | | | | | | info: InnoDB
| | | | | >ha_partition::table_flags
| | | | >find_type
| | | | >bitmap_init
| | | | >ha_partition::~ha_partition()
| | | | | >ha_partition::destroy_record_priority_queue
| | >ha_partition::create
| | >closefrm
| | >free_table_share

In ha_create_table, it gets the table handler from table->file (here the partition handler), and call the handler create function. it then goes through the trace shown above.

One thing to mention is the get_new_handler. This again uses the handlerton to create a new handler. Back in the initialize of partition handlerton, the create pointer points to the partition handler's partition_create_handler function in /sql/ha_partition.cc. This time we already have the .par file, so it reads from it and creates underlying InnoDB engines.

Creating InnoDB Tables

It then goes into InnoDB to create the actual table. And the trace below pretty much explains itself.

>ha_partition::create
| >del_ren_cre_table()
| | >ha_partition::get_from_handler_file
| | >set_up_table_before_create
| | >ha_innobase::create
| | | >create_table_def
| | | | enter: table_name: testdb1/employee#P#p0
| | | >create_index
| | >set_up_table_before_create
| | >ha_innobase::create
| | | >create_table_def
| | | | enter: table_name: testdb1/employee#P#p1
| | | >create_index
| | >set_up_table_before_create
| | >ha_innobase::create
| | | >create_table_def
| | | | enter: table_name: testdb1/employee#P#p2
| | | >create_index
| | >set_up_table_before_create
| | >ha_innobase::create
| | | >create_table_def
| | | | enter: table_name: testdb1/employee#P#p3
| | | >create_index

Returning to the Client

mysqld then closes the .frm/.par file, frees memory, flushing status to clients, and it all comes to an end.

>dispatch_command: COM_QUERY
| >mysql_parse
| | >mysql_execute_command: SQLCOM_CREATE_TABLE
| | | >mysql_create_table
| | | ...
| | | >set_ok_status
| | | proc_info: query end
| | | >trans_commit_stmt
| | | >st_select_lex_unit::cleanup
| | | | >st_select_lex::cleanup()
| | | proc_info: closing tables
| | | >close_thread_tables
| | | | >THD::binlog_flush_pending_rows_event
| | | | >THD::binlog_flush_pending_rows_event
| | | | >mysql_unlock_tables
| | | >stmt_causes_implicit_commit
| | | >trans_commit_implicit
| | | >MDL_context::release_transactional_locks
| | proc_info: freeing items
| | >lex_end
| | >Query_arena::free_items
| info: query ready
| >Protocol::end_statement
| | >Protocol::send_ok
| | | >net_send_ok
| | | | >net_flush
| | | | info: OK sent, so no more error sending allowed

Wrap Up

  1. do_command: receive commands from client
  2. dispatch_command: COM_QUERY
  3. mysql_parse: parse the command
  4. mysql_execute_command: SQLCOM_CREATE_TABLE
  5. mysql_create_table: check partition info, replace InnoDB handler with partition handler
  6. ha_create_table: creates .frm/.par files, creates underlying InnoDB handler for each partition, and call their create to create the real table.
comments powered by Disqus