5—information_schema不是innodb数据字典 | 思考mysql内核之初级系列

上次谈到了innodb缓冲区里面有些页被使用了，这些中有些被数据字典用了。那么什么是数据字典呢？bingxi和alex继续思考。 ## 1) information_schema不是innodb数据字典 bingxi：“alex，我觉得information_schema这个里面存储的不是数据字典，为了准确起见，换个说法，information_schema不是innodb数据字典。” alex：“是的，innodb一直有数据字典的概念，而information_schema是在mysql5之后才出现的。因此，information_schema不是innodb数据字典。” bingxi：“alex，这样说有点牵强。我们首先举个例子吧。在手册里面，有这么一段话： 23.4. The INFORMATION_SCHEMA STATISTICS Table The STATISTICS table provides information about table indexes. 这段话表达的意思是：information_schema. statistics存储的是表索引信息。我们在test数据库下面建立一个表t1，并且在c1上有一个索引，语句如下： ~~~ create table test.t1 ( id int, name varchar(20), key it1id(id) )engine=innodb; ~~~ 接着我们查询statistics表中t1的索引信息： mysql> select * from information_schema.statistics where table_name='t1' /G; *************************** 1. row *************************** TABLE_CATALOG: NULL TABLE_SCHEMA: test TABLE_NAME: t1 NON_UNIQUE: 1 INDEX_SCHEMA: test INDEX_NAME: it1id SEQ_IN_INDEX: 1 COLUMN_NAME: id COLLATION: A CARDINALITY: 0 SUB_PART: NULL PACKED: NULL NULLABLE: YES INDEX_TYPE: BTREE COMMENT: 1 row in set (0.02 sec) ERROR: No query specified 从中我们可以查到索引的信息，t1表真正只有一个索引么？呵呵，这里先卖个关子，在讲innodb数据字典的时候再说这个。现在我们聚焦在it1c1索引上，这些信息确实可以看到一些索引的信息，但是这个不是数据字典表，而仅仅只能供用户从外部查看使用，不能供mysql内核使用。比如，该索引在数据文件里面存储在什么地方？不知道根页信息，就没法去使用索引。我们再看看真正的innodb数据字典中包含的内容。（见文件D:/mysql-5.1.7-beta/storage/innobase/include/dict0mem.h） ~~~ /* Data structure for an index */ struct dict_index_struct{ …… dict_table_t* table; //指向所属的table字典 ulint space; //索引所在的space …… dict_tree_t* tree; //索引数结构 …… }; /* Data structure for an index tree */ struct dict_tree_struct{ …… ulint space; //索引所在的space ulint page; //索引的根结点页号 …… }; ~~~ 通过space,page我们就可以实实在在地在访问该索引。 ” alex：“顶你，是这样的。通过show create我们还可以看出这些表是临时表。 mysql> show create table information_schema.tables /G; *************************** 1. row *************************** ~~~ Table: TABLES Create Table: CREATE TEMPORARY TABLE `TABLES` ( `TABLE_CATALOG` varchar(512) default NULL, …… ) ENGINE=MEMORY DEFAULT CHARSET=utf8 1 row in set (0.00 sec) ERROR: No query specified ~~~ ” bingxi：“是的” ## 2）information_schema内容分析 alex：“bingxi，尽管information_schema不是innodb的数据字典，我们还是来摸索下information_schema对应的代码吧。主要的代码目录如下： D:/mysql-5.1.7-beta/sql/sql_show.h D:/mysql-5.1.7-beta/sql/sql_show.cpp ” bingxi：“alex，从文件名我们可以看到show，是不是show status,show variables,show processlist等也是在这个文件里面执行。” alex：“是的，没错。我们开始吧，先从两个数据结构开始。先看schema_tables数组。 ~~~ ST_SCHEMA_TABLE schema_tables[]= { {"CHARACTER_SETS", charsets_fields_info, create_schema_table, fill_schema_charsets, make_character_sets_old_format, 0, -1, -1, 0}, …… {"STATUS", variables_fields_info, create_schema_table, fill_status, make_old_format, 0, -1, -1, 1}, {"TABLES", tables_fields_info, create_schema_table, get_all_tables, make_old_format, get_schema_tables_record, 1, 2, 0}, {"TABLE_CONSTRAINTS", table_constraints_fields_info, create_schema_table, get_all_tables, 0, get_schema_constraints_record, 3, 4, 0}, …… }; ~~~ 数组有26个成员，而information_schema的5.1.7版本中只有22个表。这是可以理解的，比如该数组里面有status、variable，而这个在information_schema下是没有。我们通过show status，show variables来执行。我们接着说这个数组的成员，每个成员是一个数组结构的取值，见下面的定义： ~~~ typedef struct st_schema_table { const char* table_name; ST_FIELD_INFO *fields_info; TABLE *(*create_table) (THD *thd, struct st_table_list *table_list); int (*fill_table) (THD *thd, struct st_table_list *tables, COND *cond); int (*old_format) (THD *thd, struct st_schema_table *schema_table); int (*process_table) (THD *thd, struct st_table_list *tables, TABLE *table, bool res, const char *base_name, const char *file_name); int idx_field1, idx_field2; bool hidden; } ST_SCHEMA_TABLE; 我们以tables这样表为例 {"TABLES", tables_fields_info, create_schema_table, get_all_tables, make_old_format, get_schema_tables_record, 1, 2, 0}, tables_fields_info表示的就是。 ST_FIELD_INFO tables_fields_info[]= { {"TABLE_CATALOG", FN_REFLEN, MYSQL_TYPE_STRING, 0, 1, 0}, {"TABLE_SCHEMA",NAME_LEN, MYSQL_TYPE_STRING, 0, 0, 0}, {"TABLE_NAME", NAME_LEN, MYSQL_TYPE_STRING, 0, 0, "Name"}, {"TABLE_TYPE", NAME_LEN, MYSQL_TYPE_STRING, 0, 0, 0}, {"ENGINE", NAME_LEN, MYSQL_TYPE_STRING, 0, 1, "Engine"}, {"VERSION", 21 , MYSQL_TYPE_LONG, 0, 1, "Version"}, {"ROW_FORMAT", 10, MYSQL_TYPE_STRING, 0, 1, "Row_format"}, {"TABLE_ROWS", 21 , MYSQL_TYPE_LONG, 0, 1, "Rows"}, {"AVG_ROW_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Avg_row_length"}, {"DATA_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Data_length"}, {"MAX_DATA_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Max_data_length"}, {"INDEX_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Index_length"}, {"DATA_FREE", 21 , MYSQL_TYPE_LONG, 0, 1, "Data_free"}, {"AUTO_INCREMENT", 21 , MYSQL_TYPE_LONG, 0, 1, "Auto_increment"}, {"CREATE_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Create_time"}, {"UPDATE_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Update_time"}, {"CHECK_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Check_time"}, {"TABLE_COLLATION", 64, MYSQL_TYPE_STRING, 0, 1, "Collation"}, {"CHECKSUM", 21 , MYSQL_TYPE_LONG, 0, 1, "Checksum"}, {"CREATE_OPTIONS", 255, MYSQL_TYPE_STRING, 0, 1, "Create_options"}, {"TABLE_COMMENT", 80, MYSQL_TYPE_STRING, 0, 0, "Comment"}, {0, 0, MYSQL_TYPE_STRING, 0, 0, 0} }; ~~~ 这个表示的就是tables表的字段，不考虑这行’ {0, 0, MYSQL_TYPE_STRING, 0, 0, 0}’，对比下desc tables;两边是一样的。 ” Bingxi：“我顶你，我们通过一个例子来看吧，以show status为例。 ~~~ {"STATUS", variables_fields_info, create_schema_table, fill_status, make_old_format, 0, -1, -1, 1}, //根据对比，我们可以知道： // create_schema_table的功能是：TABLE *(*create_table) // fill_status的功能是：int (*fill_table) // make_old_format的功能是：int (*old_format)，这个可以暂时不调试首先我们查看函数mysql_schema_table，在其中调用了函数create_schema_table。 int mysql_schema_table(THD *thd, LEX *lex, TABLE_LIST *table_list) { …… // table_list->schema_table对应的结构就是st_schema_table //对应的值为：{"STATUS", variables_fields_info, create_schema_table, fill_status, // make_old_format, 0, -1, -1, 1}, //因此这里的create_table等于访问create_schema_table if (!(table= table_list->schema_table->create_table(thd, table_list))) { DBUG_RETURN(1); } …… } ~~~ create_schema_table函数作用是什么呢？从名字我们可以看出，就是创建表，创建status的临时表。表的字段有两个：Variable_name、Value。见下面的代码。 ~~~ TABLE *create_schema_table(THD *thd, TABLE_LIST *table_list) { …… List<Item> field_list; ST_SCHEMA_TABLE *schema_table= table_list->schema_table; ST_FIELD_INFO *fields_info= schema_table->fields_info; …… //fields_info就是schema_table->fields_info，里面记录了查询字段 //第一个fields_info->field_name的值是'Variable_name' //根据这个值创建了一个item实例，然后丢到field_list这个list里面 //第二个fields_info->field_name的值是'Value' //同样根据这个值，再创一个item，同样丢到field_list这个list里面 //这样field_list就描述了临时表的列信息 for (; fields_info->field_name; fields_info++) { …… //屏蔽调ields_info->field_type的差异性 item->max_length= fields_info->field_length * cs->mbmaxlen; item->set_name(fields_info->field_name, strlen(fields_info->field_name), cs); …… field_list.push_back(item); item->maybe_null= fields_info->maybe_null; field_count++; } TMP_TABLE_PARAM *tmp_table_param = (TMP_TABLE_PARAM*) (thd->calloc(sizeof(TMP_TABLE_PARAM))); tmp_table_param->init(); tmp_table_param->table_charset= cs; tmp_table_param->field_count= field_count; tmp_table_param->schema_table= 1; SELECT_LEX *select_lex= thd->lex->current_select; //调用函数create_tmp_table //可以看到参数中有field_list，也就是字段列表有了 //table_list->alias的值是STATUS //于是就是创建了临时表 if (!(table= create_tmp_table(thd, tmp_table_param, field_list, (ORDER*) 0, 0, 0, (select_lex->options | thd->options | TMP_TABLE_ALL_COLUMNS), HA_POS_ERROR, table_list->alias))) …… } ~~~ 创建了临时表，但是光有临时表是不够的，因此在查询执行时，需要将值进行填充 ~~~ void JOIN::exec() { …… if ((curr_join->select_lex->options & OPTION_SCHEMA_TABLE) && get_schema_tables_result(curr_join)) { DBUG_VOID_RETURN; } …… ｝ get_schema_tables_result函数就是调用fill_status的地方，见函数。 bool get_schema_tables_result(JOIN *join) { …… for (JOIN_TAB *tab= join->join_tab; tab < tmp_join_tab; tab++) { …… // table_list->schema_table对应的结构就是st_schema_table //对应的值为：{"STATUS", variables_fields_info, create_schema_table, fill_status, // make_old_format, 0, -1, -1, 1}, //因此这里的fill_table等于访问fill_status if (table_list->schema_table->fill_table(thd, table_list, tab->select_cond)) result= 1; table_list->is_schema_table_processed= TRUE; …… } …… } ~~~ 于是执行fill_status进行填充数据的操作。 ~~~ int fill_status(THD *thd, TABLE_LIST *tables, COND *cond) { DBUG_ENTER("fill_status"); LEX *lex= thd->lex; const char *wild= lex->wild ? lex->wild->ptr() : NullS; int res= 0; STATUS_VAR tmp; pthread_mutex_lock(&LOCK_status); //如果是show global，则需要执行calc_sum_of_all_status进行累加。 if (lex->option_type == OPT_GLOBAL) calc_sum_of_all_status(&tmp); //进行数据插入操作 res= show_status_array(thd, wild, (SHOW_VAR *)all_status_vars.buffer, OPT_GLOBAL, (lex->option_type == OPT_GLOBAL ? &tmp: &thd->status_var), "",tables->table); pthread_mutex_unlock(&LOCK_status); DBUG_RETURN(res); } ~~~ 为了了解得更清楚，我们再看下show_status_array函数。 ~~~ static bool show_status_array(THD *thd, const char *wild, SHOW_VAR *variables, enum enum_var_type value_type, struct system_status_var *status_var, const char *prefix, TABLE *table) { //传递过来的variables是全局变量：(SHOW_VAR *)all_status_vars.buffer //因此对于变量执行循环操作 for (; variables->name; variables++) { …… restore_record(table, s->default_values); table->field[0]->store(name_buffer, strlen(name_buffer), system_charset_info); table->field[1]->store(pos, (uint32) (end - pos), system_charset_info); //将记录插入表 if (schema_table_store_record(thd, table)) DBUG_RETURN(TRUE); …… } …… } ~~~ 执行到这里，status表里面已经有了所有的数据。然后继续执行，显示出来就行了。 ” Alex：“我明白了。其它的也是类似的，差异性也是有的，比如tables需要进行数据文件夹的扫描，呵呵。” Bingxi：“是的，都差不多的。” Alex：“我的建议是，将该cpp文件里面的函数都设置断点，然后每个语句执行一下。比如select * from information_schema.tables /G，用这样的方法把该模式下的22个表测试一边，并测试下show语句，show processlist，show variable,show ceate table test.t1等” Bingxi：“是的” Alex：“已经0点了，早点休息吧。晚安” Bingxi：“晚安”