The storage requirements for data vary, according to the storage engine being used for the table in question. Different storage engines use different methods for recording the raw data and different data types. In addition, some engines may compress the information in a given row, either on a column or entire row basis, making calculation of the storage requirements for a given table or column structure.
However, all storage engines must communicate and exchange information on a given row within a table using the same structure, and this information is consistent, irrespective of the storage engine used to write the information to disk.
This sections includes some guideliness and information for the the storage requirements for each of the data types supported by MySQL, including details for the internal format and the sizes used by storage engines that used a fixed size representation for different types. Information is listed by category and/or storage engine.
      The maximum size of a row in a MyISAM table is
      65,535 bytes. This figure excludes
      BLOB or
      TEXT columns, which contribute only
      9–12 bytes toward this size.
    
      The internal representation of a table also has a maximum row size
      of 65,535 bytes, even if the storage engine is capable of
      supporting larger rows. For BLOB
      and and TEXT data, the information
      is stored internally in a different area of memory than the row
      buffer. Different storage engines handle the allocation and
      storage of this data in different ways, according to the method
      they use for handling the corresponding types. See
      Chapter 13, Storage Engines, for more information.
    
        For tables using the NDBCLUSTER
        storage engine, there is the factor of 4-byte
        alignment to be taken into account when calculating
        storage requirements. This means that all
        NDB data storage is done in
        multiples of 4 bytes. Thus, a column value that would take 15
        bytes in a table using a storage engine other than
        NDB requires 16 bytes in an
        NDB table. This requirement applies
        in addition to any other considerations that are discussed in
        this section. For example, in
        NDBCLUSTER tables, the
        TINYINT,
        SMALLINT,
        MEDIUMINT, and
        INTEGER
        (INT) column types each require 4
        bytes storage per record due to the alignment factor.
      
        An exception to this rule is the
        BIT type, which is
        not 4-byte aligned. In MySQL Cluster
        tables, a BIT(
        column takes M)M bits of storage space.
        However, if a table definition contains 1 or more
        BIT columns (up to 32
        BIT columns), then
        NDBCLUSTER reserves 4 bytes (32
        bits) per row for these. If a table definition contains more
        than 32 BIT columns (up to 64
        such columns), then NDBCLUSTER
        reserves 8 bytes (that is, 64 bits) per row.
      
        In addition, while a NULL itself does not
        require any storage space,
        NDBCLUSTER reserves 4 bytes per row
        if the table definition contains any columns defined as
        NULL, up to 32 NULL
        columns. (If a MySQL Cluster table is defined with more than 32
        NULL columns up to 64 NULL
        columns, then 8 bytes per row is reserved.)
      
      When calculating storage requirements for MySQL Cluster tables,
      you must also remember that every table using the
      NDBCLUSTER storage engine requires a
      primary key; if no primary key is defined by the user, then a
      “hidden” primary key will be created by
      NDB. This hidden primary key consumes
      31-35 bytes per table record.
    
      You may find the ndb_size.pl utility to be
      useful for estimating NDB storage
      requirements. This Perl script connects to a current MySQL
      (non-Cluster) database and creates a report on how much space that
      database would require if it used the
      NDBCLUSTER storage engine. See
      ndb_size.pl, for more
      information.
    
      Storage Requirements for Numeric Types in
      MyISAM
    
| Data Type | Storage Required | 
| TINYINT | 1 byte | 
| SMALLINT | 2 bytes | 
| MEDIUMINT | 3 bytes | 
| INT,INTEGER | 4 bytes | 
| BIGINT | 8 bytes | 
| FLOAT( | 4 bytes if 0 <= p<= 24, 8 bytes if 25
              <=p<= 53 | 
| FLOAT | 4 bytes | 
| DOUBLE [PRECISION],REAL | 8 bytes | 
| DECIMAL(,NUMERIC( | Varies; see following discussion | 
| BIT( | approximately ( M+7)/8 bytes | 
      Values for DECIMAL (and
      NUMERIC) columns are represented
      using a binary format that packs nine decimal (base 10) digits
      into four bytes. Storage for the integer and fractional parts of
      each value are determined separately. Each multiple of nine digits
      requires four bytes, and the “leftover” digits
      require some fraction of four bytes. The storage required for
      excess digits is given by the following table.
    
| Leftover Digits | Number of Bytes | 
| 0 | 0 | 
| 1 | 1 | 
| 2 | 1 | 
| 3 | 2 | 
| 4 | 2 | 
| 5 | 3 | 
| 6 | 3 | 
| 7 | 4 | 
| 8 | 4 | 
      Storage Requirements for Date and Time Types
      in MyISAM
    
The storage requirements shown in the table arise from the way that MySQL represents temporal values:
          DATE: A three-byte integer
          packed as DD +
          MM×32 +
          YYYY×16×32
        
          TIME: A three-byte integer
          packed as DD×24×3600 +
          HH×3600 +
          MM×60 + SS
        
          DATETIME: Eight bytes:
        
              A four-byte integer packed as
              YYYY×10000 +
              MM×100 +
              DD
            
              A four-byte integer packed as
              HH×10000 +
              MM×100 +
              SS
            
          TIMESTAMP: A four-byte integer
          representing seconds UTC since the epoch ('1970-01-01
          00:00:00' UTC)
        
          YEAR: A one-byte integer
        
      Storage Requirements for String Types in
      MyISAM
    
      In the following table, M represents
      the declared column length in characters for nonbinary string
      types and bytes for binary string types.
      L represents the actual length in bytes
      of a given string value.
    
| Data Type | Storage Required | 
| CHAR( | M×wbytes,
              0<= 255, wherewis
              the number of bytes required for the maximum-length
              character in the character set | 
| BINARY( | Mbytes, 0<=
              255 | 
| VARCHAR(,VARBINARY( | L+ 1 bytes if column values require 0
              – 255 bytes,L+ 2 bytes
              if values may require more than 255 bytes | 
| TINYBLOB,TINYTEXT | L+ 1 bytes, whereL<
              28 | 
| BLOB,TEXT | L+ 2 bytes, whereL<
              216 | 
| MEDIUMBLOB,MEDIUMTEXT | L+ 3 bytes, whereL<
              224 | 
| LONGBLOB,LONGTEXT | L+ 4 bytes, whereL<
              232 | 
| ENUM(' | 1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum) | 
| SET(' | 1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum) | 
      Variable-length string types are stored using a length prefix plus
      data. The length prefix requires from one to four bytes depending
      on the data type, and the value of the prefix is
      L (the byte length of the string). For
      example, storage for a MEDIUMTEXT
      value requires L bytes to store the
      value plus three bytes to store the length of the value.
    
      To calculate the number of bytes used to store a particular
      CHAR,
      VARCHAR, or
      TEXT column value, you must take
      into account the character set used for that column and whether
      the value contains multi-byte characters. In particular, when
      using the utf8 Unicode character set, you must
      keep in mind that not all characters use the same number of bytes
      and can require up to three bytes per character. For a breakdown
      of the storage used for different categories of
      utf8 characters, see
      Section 9.1.10, “Unicode Support”.
    
      VARCHAR,
      VARBINARY, and the
      BLOB and
      TEXT types are variable-length
      types. For each, the storage requirements depend on these factors:
    
The actual length of the column value
The column's maximum possible length
The character set used for the column, because some character sets contain multi-byte characters
      For example, a VARCHAR(255) column can hold a
      string with a maximum length of 255 characters. Assuming that the
      column uses the latin1 character set (one byte
      per character), the actual storage required is the length of the
      string (L), plus one byte to record the
      length of the string. For the string 'abcd',
      L is 4 and the storage requirement is
      five bytes. If the same column is instead declared to use the
      ucs2 double-byte character set, the storage
      requirement is 10 bytes: The length of 'abcd'
      is eight bytes and the column requires two bytes to store lengths
      because the maximum length is greater than 255 (up to 510 bytes).
    
        The effective maximum number of bytes that
        can be stored in a VARCHAR or
        VARBINARY column is subject to
        the maximum row size of 65,535 bytes, which is shared among all
        columns. For a VARCHAR column
        that stores multi-byte characters, the effective maximum number
        of characters is less. For example,
        utf8 characters can require up to three bytes
        per character, so a VARCHAR
        column that uses the utf8 character set can
        be declared to be a maximum of 21,844 characters.
      
      The size of an ENUM object is
      determined by the number of different enumeration values. One byte
      is used for enumerations with up to 255 possible values. Two bytes
      are used for enumerations having between 256 and 65,535 possible
      values. See Section 10.4.4, “The ENUM Type”.
    
      The size of a SET object is
      determined by the number of different set members. If the set size
      is N, the object occupies
      ( bytes,
      rounded up to 1, 2, 3, 4, or 8 bytes. A
      N+7)/8SET can have a maximum of 64
      members. See Section 10.4.5, “The SET Type”.
    


User Comments
Had a lot of trouble finding the maximum table size in bytes for capacity planning. More specifically it was InnoDB tables that I had a problem with. Average row size is good, but I wanted maximum row size.
I checked several products and could not find what I wanted. Some of the tables I deal with are 300+ fields and so manual calculation was not practical.
So I wrote a little perl script that does it. Thought it might be of some use, so I include it here...it does all field types except enum/set types. It does not calculate anything regarding index size.
Just do a mysqldump -d (just the schema) of your DB to a file, and run this perl script specifying the schema file as the only argument.
----------------------------------------------------------------
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;
my %DataType =
("TINYINT"=>1,
"SMALLINT"=>2,
"MEDIUMINT"=>3,
"INT"=>4,
"BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3,
"DATETIME"=>8,
"TIMESTAMP"=>4,
"TIME"=>3,
"YEAR"=>1,
"CHAR"=>'$M',
"VARCHAR"=>'$M+1',
"TINYBLOB"=>'$M+1',
"TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2',
"TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3',
"MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4',
"LONGTEXT"=>'$M+4');
my $D;
my $M;
my $dt;
my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>)
{
chomp;
if ( $_ =~ s/create table[ ]*([a-zA-Z_]*).*/$1/i )
{
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";
Here's a modification of Marc's script above that also handles ENUM's. Enjoy.
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;
my %DataType =
("TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3,
"INT"=>4, "BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "TIME"=>3, "YEAR"=>1,
"CHAR"=>'$M', "VARCHAR"=>'$M+1',
"ENUM"=>1,
"TINYBLOB"=>'$M+1', "TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2', "TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3', "MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4', "LONGTEXT"=>'$M+4');
my ($D, $M, $dt);
my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>) {
chomp;
if ( $_ =~ s/create table[ ]`*([a-zA-Z_]*).*`/$1/i ) {
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|ENUM[ ]*\(*['A-Za-z_,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/\(.*\)//g;
$dt =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";
The above scripts are not taking into account several important information (so they are outdated)
1. the database/table encoding.
If you have an UTF8 encoding for a varchar(100) that it will take up 300 bytes (3 bytes per UTF symbol)
"[...]As of MySQL 4.1, to calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character."
2. enum can have either 1 or 2 bytes
"[...]The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values."
Here I wrote another script based on Marc's, that takes into account what Alex wrote and more.
It calculates VARCHAR/CHAR/TEXT taking CHARSET or COLLATION into account, calculates properly SET and ENUM size, DECIMAL/NUMERIC is calculated according to >5.0.3 packed standard.
Calculates also least row byte size for dynamic row length tables.
It uses "mysql" and "mysqldump" tools internally.
Any argument to this script is provided as an argument for mysqldump.
Example: {scriptname} --all-databases
Please report any bug, especially when it comes to size calculations. Enjoy.
----------- copy here --------------
#!/usr/bin/perl
use strict;
$| = 1;
my %DataType = (
"TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3, "INT"=>4, "INTEGER"=>4, "BIGINT"=>8,
"FLOAT"=>'$M<=24?4:8', "DOUBLE"=>8,
"DECIMAL"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"NUMERIC"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"BIT"=>'($M+7)>>3',
"DATE"=>3, "TIME"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "YEAR"=>1,
"BINARY"=>'$M',"CHAR"=>'$M*$CL',
"VARBINARY"=>'$M+($M>255?2:1)', "VARCHAR"=>'$M*$CL+($M>255?2:1)',
"ENUM"=>'$M>255?2:1', "SET"=>'($M+7)>>3',
"TINYBLOB"=>9, "TINYTEXT"=>9,
"BLOB"=>10, "TEXT"=>10,
"MEDIUMBLOB"=>11, "MEDIUMTEXT"=>11,
"LONGBLOB"=>12, "LONGTEXT"=>12
);
my %DataTypeMin = (
"VARBINARY"=>'($M>255?2:1)', "VARCHAR"=>'($M>255?2:1)'
);
my ($D, $M, $S, $C, $L, $dt, $dp ,$bc, $CL);
my $fieldCount = 0;
my $byteCount = 0;
my $byteCountMin = 0;
my @fields = ();
my $fieldName;
my $tableName;
my $defaultDbCL = 1;
my $defaultTableCL = 1;
my %charsetMaxLen;
my %collationMaxLen;
open (CHARSETS, "mysql -B --skip-column-names information_schema -e 'select CHARACTER_SET_NAME,MAXLEN from CHARACTER_SETS;' |");
%charsetMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <CHARSETS>);
close CHARSETS;
open (COLLATIONS, "mysql -B --skip-column-names information_schema -e 'select COLLATION_NAME,MAXLEN from CHARACTER_SETS INNER JOIN COLLATIONS USING(CHARACTER_SET_NAME);' |");
%collationMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <COLLATIONS>);
close COLLATIONS;
open (TABLEINFO, "mysqldump -d --compact ".join(" ",@ARGV)." |");
while (<TABLEINFO>) {
chomp;
if ( ($S,$C) = /create database.*?`([^`]+)`.*default\scharacter\sset\s+(\w+)/i ) {
$defaultDbCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : 1;
print "Database: $S".($C?" DEFAULT":"").($C?" CHARSET $C":"")." (bytes per char: $defaultDbCL)\n\n";
next;
}
if ( /^create table\s+`([^`]+)`.*/i ) {
$tableName = $1;
@fields = ();
next;
}
if ( $tableName && (($C,$L) = /^\)(?:.*?default\scharset=(\w+))?(?:.*?collate=(\w+))?/i) ) {
$defaultTableCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultDbCL);
print "Table: $tableName".($C||$L?" DEFAULT":"").($C?" CHARSET $C":"").($L?" COLLATION $L":"")." (bytes per char: $defaultTableCL)\n";
$tableName = "";
$fieldCount = 0;
$byteCount = 0;
$byteCountMin = 0;
while ($_ = shift @fields) {
if ( ($fieldName,$dt,$dp,$M,$D,$S,$C,$L) = /\s\s`([^`]+)`\s+([a-z]+)(\((\d+)(?:,(\d+))?\)|\((.*)\))?(?:.*?character\sset\s+(\w+))?(?:.*?collate\s+(\w+))?/i ) {
$dt = uc $dt;
if (exists $DataType{$dt}) {
if (length $S) {
$M = ($S =~ s/(\'.*?\'(?!\')(?=,|$))/$1/g);
$dp = "($M : $S)"
}
$D = 0 if !$D;
$CL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultTableCL);
$bc = eval($DataType{$dt});
$byteCount += $bc;
$byteCountMin += exists $DataTypeMin{$dt} ? $DataTypeMin{$dt} : $bc;
} else {
$bc = "??";
}
$fieldName.="\t" if length($fieldName) < 8;
print "bytes:\t".$bc."\t$fieldName\t$dt$dp".($C?" $C":"").($L?" COLL $L":"")."\n";
++$fieldCount;
}
}
print "total:\t$byteCount".($byteCountMin!=$byteCount?"\tleast: $byteCountMin":"\t\t")."\tcolumns: $fieldCount\n\n";
next;
}
push @fields, $_;
}
close TABLEINFO;
Add your own comment.