Document Version: 1.2
Date: March 5, 2025
Classification: Technical Reference
Subject: Internal Structure of SQL Server STATS_STREAM Binary Format
Document Type: Read-Only Format Specification
Five principal regions:
Sequential reading; random access to steps via offset table. Size: 64 bytes (empty) to several KB (200 steps).
Addressing:
off1 position)off1 = core header start = L1 - L2 (from global header).
Little-endian. Natural alignment where possible. Reserved fields must be zero.
First 64 bytes: format ID, sizes, data type descriptors, validation fields. Present in all blobs.
Four-byte format version; value 0x00000001. Format signature for validation.
Four-byte number of vectors (columns). 1 = single-column; 2–32 = multi-column.
Reserved regions (zeros). Non-zero causes rejection.
Four-byte integrity validation field. SQL Server rejects blobs with incorrect values (Error 9105).
Four-byte total blob length in bytes (all headers, vectors, histogram).
Four-byte size of content after core header start. L2 = L1 - off1.
Overlaps first vector descriptor. Primary column data type:
Offset 44: integer types = off1 - 8; other types = system type ID.
Four-byte string-type identifier. Zero for non-string types. For string types (VARCHAR 167, NVARCHAR 231, CHAR 175, NCHAR 239, TEXT 35, NTEXT 99), when the descriptor region does not overlap bytes 56–59: value 0x04d00800 (or 0x00d00800 for max/large length). Fixture analysis: all non-zero values in sampled blobs are 0x04d00800 and correspond to string-column statistics. Extract from original when non-zero if not string-type.
A four-byte validation field where byte 0 functions as a bitmask encoding various statistics modes and features:
Bit 0: Ascending key detection flag—set for ascending distributions, filtered statistics, and temporal tables
Bit 1: Distribution or memory-optimized flag—indicates special distribution handling or memory-optimized table statistics
Bit 2: Temporal history exclusive flag—set only for temporal history tables
Bit 3: Multi-purpose mode indicator—set for Trace Flag 9612 mode, ascending key mode, multi-column statistics, temporal tables, and persisted samples
Bit 4: Multi-column indicator—set for statistics with two or more columns, also for temporal history
Bit 5: Complex flag—set for multi-column statistics, TF9612 mode, temporal history, and combinations of persisted sample with filtered statistics
Bit 6: Advanced features flag—set for TF9612 mode, multi-column statistics with three or more columns, indexed views, columnstore indexes, and temporal history
Bit 7: Incremental statistics flag—set conditionally for incremental statistics created with sampling (not full scan)
Common bitmask patterns include 0x00 (baseline uniform distribution), 0x0b (ascending distribution), 0x68 (Trace Flag 9612 or multi-column with 3+ columns), 0x38 (two-column statistics), and 0x76 (temporal history tables).
Three four-byte fields; bytes 72-75 are well-established; bytes 64-71 are engine-specific:
Vector descriptors provide detailed data type metadata for each column in the statistics. These variable-length structures begin at byte 40 of the global header and extend for 12 bytes per column (16 bytes if the type includes collation information). The first vector descriptor overlaps with global header bytes 40-51.
Each vector descriptor contains:
Common system type IDs include:
Multi-column: vector descriptors repeat per column; count in global header. Padding to alignment before core header.
Core header at off1 = L1 - L2; 357 bytes fixed layout.
Four-byte constant. Global header 64-67 is not 0x07 in sampled blobs; core header +0 to +3 may be 0x00000007 in engine blobs—validate per implementation.
Eight-byte last statistics update time (SQL Server internal datetime).
Two eight-byte integer fields:
A complex region containing density values used for selectivity estimation:
Prefix densities for multi-column selectivity.
Critical fields describing histogram structure:
Four-byte reserved field containing zeros across all analyzed blobs.
Array of 33 four-byte float values providing average key lengths for each prefix in multi-column statistics.
Single-byte flag encoding statistics features. Common values include:
Encodes data type, update status, features. Correlation: The bit at 0x02 predicts the histogram offset table position: (flag & 0x02) != 0 → table at off1+340; else → off1+356. See docs/FEATURE_FLAG_OFF1_324.md for definition and correlation summary.
Sixteen-byte structured field with complex encoding:
Byte +325: Format identifier, constant 0x04 in 99.9% of blobs. Exception: empty columnstore statistics (step count and row count both zero) use 0x00. Note that non-columnstore empty statistics use a completely different format with only 64-byte blobs containing no core header.
Bytes +326 to +339: Reserved padding containing zeros.
Byte +340: Bitmask encoding system type base value and feature flags. The bitmask formula is base_value_for_system_type OR feature_bits, where:
System type base values vary by data type:
0x38 in 47% of cases0x58 or 0xf9 in approximately 46% each0x20 or 0x2c in approximately 48% eachFeature bits overlay the base value:
Bits 1 and 7 both encode non-columnstore; exact separate meaning of each bit is not required for round-trip when using per-type observed values (e.g. INT 0x38, BIGINT 0x58, NVARCHAR 0x50). This byte cannot distinguish multi-column, incremental, persisted sample, norecompute when only one is set.
Only when the 16-byte metadata block is present (feature flag at +324 has bit 0x02 clear → offset table at off1+356): the eight bytes at off1+340..off1+347 are the histogram length — little-endian uint64 = total size of the histogram step region. Usually step_count × (step_size + 11) (11 = marker 4 + trailer 3 + overhead 4). Exceptions: IMAGE, NTEXT, TEXT.
When the block is absent (bit 0x02 set → table at off1+340), bytes 340–347 are the first eight bytes of the offset table (first step offset), not histogram length. See §6.2 and docs/FEATURE_FLAG_OFF1_324.md.
Only when the 16-byte block is present (offset table at off1+356): the next eight bytes are lifecycle/metadata:
histogram_length + N, little-endian uint32. N ∈ {8, 80, 156, 232, 448, 509} (e.g. +8 initial, +80 after INSERT, +156 after UPDATE).When the block is absent (table at off1+340), bytes 348–355 are bytes 9–16 of the offset table (second step offset), not gap. See §6.2 and docs/FEATURE_FLAG_OFF1_324.md.
Offset table: one entry per histogram step. Each entry is the absolute byte offset from blob start to that step (standard layout), or a relative offset in low-compression layouts.
One eight-byte little-endian integer per step. Size = step_count × 8 bytes (e.g. 1,600 for 200 steps).
The table starts at off1+340 or off1+356 only, as predicted by the feature flag at off1+324 (see implementation note below).
Offset values are absolute positions within the blob. Each table entry is a literal absolute byte offset: the eight-byte value at table start + N × 8 is the byte offset of step N. To access step N, read that value and seek to that position. A large value (e.g. near blob length) indicates the step region is placed near the end of the blob; a small value indicates placement nearer the core header. No encoding or decode step is applied to any entry.
Observed gap: In some engine-written blobs, reading every entry as a literal step offset yields wrong step positions: e.g. table entry 0 may not equal the byte offset of the first step marker in the blob, or entry 0 may be out of bounds, so a parser that seeks to entry 0 reads the wrong bytes or fails. When entry 1, 2, … fall inside the table’s byte range, treating them as step starts causes the parser to read entry bytes as step data. The engine still produces the correct histogram (as shown by DBCC SHOW_STATISTICS WITH HISTOGRAM).
Parser fallback (decision order): (1) Read the offset table at the predicted position (off1+340 or off1+356 from the feature flag). (2) If any entry is out of range or overflow, try the other table position (340 vs 356); if the literal table there is valid, use it and resolve overlap if needed. (3) When literal at both positions fails, use overlap-from-entry0: sequential layout (stride = 4 + stepSize) from the first table entry at the predicted position, or end-packed layout (step block at end of blob) for INT (marker 0x00000014), or a 200-step scan for known INT blobs. (4) For step_count = 1 only, when literal and overlap both fail, use the table-end-relative rule: step0 = tableEndExclusive + gap, where tableEndExclusive = tableStart + step_count×8; gap = 8 when feature_flag_324 = 0x11, else 16. The 4-byte marker at that offset must be INT-like (0x00000014 or 0x00140010).
Step block after table: In many blobs the first step lies at a fixed offset after the offset table: step0Offset = tableEndExclusive + gap, with gap 8 (feature_flag_324 = 0x11) or 16 (0x19, 0x1b, etc.). The fallback in (4) above implements this for single-step histograms when earlier steps fail.
Single-step histograms may omit the offset table or use offset value zero to indicate inline storage. Empty histograms (step count zero) result in no offset table presence.
Header step_count (+164, +168) can exceed the number of physically stored or decodable steps. Engine may reserve space or use a compressed representation. Decoded step count may be less than header step_count; layout logic must choose header value or decoded count.
Implementation note (offset table use). The feature flag at off1+324 determines the predicted table position: if (flag & 0x02) != 0 (e.g. 0x13, 0x1b), the 16-byte histogram-length and gap block is absent and the table starts at off1+340; if the bit is clear (e.g. 0x11, 0x19), the block is present and the table starts at off1+356. The table is only at one of these two positions. Parsers should try the predicted position first; if reading the table there yields out-of-bounds or invalid entries, try the other position (340 vs 356) before applying overlap or table-end-relative fallbacks. For step_count = 1, when both positions and overlap fail, use step0 = tableEndExclusive + gap (gap = 8 for flag 0x11, else 16).
The data space of the offset table is the logical region the engine uses for histogram table data. It comprises:
A parser that reads the table region only as contiguous 8-byte entries may fail to re-parse after in-place scaling when the engine has stored step payload in that range (overlap). Scaling tools should validate that no step payload overlaps the table byte range before overwriting, or use the same fallback order as above to re-locate steps.
Histogram steps contain the statistical distribution data used for cardinality estimation. Each step represents a value range with associated row counts. SQL Server maintains a maximum of 200 histogram steps per statistics object.
Each histogram step contains:
0x10001100; in type-specific parsing the INT step marker is also observed as 0x00000014 (20 decimal, matching the step data size). Other types use type-specific marker values (e.g. BIGINT 0x00000018).0x040000Total step size varies by data type due to variable-length range high key encoding.
Range high key values use data-type-specific binary representations:
Integer types store range high keys as native little-endian integers:
Floating-point types use IEEE 754 format:
Special handling is required for denormalized values and special values (infinity, NaN).
String types employ variable-length encoding:
Length prefixes indicate character count for variable-length types.
Temporal types use SQL Server's internal representations:
DECIMAL and NUMERIC types use variable-length signed integer storage with implied scale. Storage size depends on precision, ranging from 5 to 17 bytes.
Binary types mirror their string counterparts:
Histogram steps must be ordered by ascending range high key values. The first step represents all values less than or equal to its range high key. Subsequent steps represent ranges between the previous step's range high key (exclusive) and the current step's range high key (inclusive).
NULL values are not represented as histogram steps. NULL statistics are maintained separately through density vector calculations. If a histogram step's range high key would be NULL, that step is omitted from the blob.
SQL Server performs strict validation on several fields during statistics blob consumption:
Checksum at global header 16-19 is validated. Incorrect values cause Error 9105.
All reserved fields containing zeros are validated. Non-zero values in these regions result in blob rejection:
Fields with redundant copies must match exactly:
Offset 44 must match:
off1 - 8Mismatch causes rejection.
Multiple length relationships must hold:
L2 = L1 - off1SQL Server validates structural properties:
Empty statistics (zero rows sampled) employ a simplified format:
Non-columnstore empty statistics: Blob contains only the 64-byte global header with no core header or histogram data. Total size is exactly 64 bytes.
Columnstore empty statistics: Blob contains global header and core header (with core header +325 set to 0x00 instead of 0x04), but no histogram data. Total size exceeds 64 bytes but contains no histogram steps.
Histograms with exactly one step may optimize storage by omitting the offset table or using inline storage. The offset table entry may contain zero to indicate the step immediately follows the offset table region.
Filtered statistics (created with WHERE predicates) set specific flags in the bitmask at offset 60-63. The unfiltered rows field (core header +184-187) contains the base table row count, while the rows field contains filtered row count.
Multi-column statistics contain multiple vector descriptors, extended density vectors, and modified histogram structures. The global header offset 60-63 bitmask encodes the column count, with different patterns for two-column versus three-or-more-column statistics.
Incremental statistics (partitioned table statistics) set bit 7 in the offset 60-63 bitmask when created with sampling. The feature flag at core header +324 uses values 0xb1 or 0xb9 to indicate incremental mode.
Trace Flag 9612 affects offset 60-63 bitmask; when enabled, often 0x0b or 0x68.
Temporal tables (system-versioned tables with history) use distinct bitmask patterns:
Statistics on memory-optimized tables set bit 1 in the offset 60-63 bitmask, typically resulting in value 0x02 or 0x0b depending on distribution characteristics.
Columnstore index statistics use specific constants and bitmask patterns:
Statistics on indexed views may set bit 6 in the offset 60-63 bitmask (value 0x40), though this behavior depends on view complexity and column count.
SQL Server 2025 RTM. All structure descriptions, field layouts, and patterns in this document reflect that version.
Format version (bytes 0-3) is 0x00000001 in analyzed blobs. Compatibility with earlier versions (2012-2022) is not confirmed.
Feature flags, bitmask patterns, and validation requirements may vary between SQL Server versions.
Features present in SQL Server 2025 RTM:
Blobs from other SQL Server versions:
0x04d00800 (or 0x00d00800 for max/large). Documented in 3.2.8.Bytes 64–71 (global header):
Constant or highly predictable:
Derived from other blob contents:
Complex state (pattern recognition):
Data-dependent content:
| Value | Binary | Interpretation |
|---|---|---|
| 0x00 | 00000000 | Baseline uniform distribution, single-column, no special features |
| 0x01 | 00000001 | Filtered statistics |
| 0x02 | 00000010 | Memory-optimized table |
| 0x0a | 00001010 | Persisted sample |
| 0x0b | 00001011 | Ascending distribution or temporal table |
| 0x38 | 00111000 | Two-column statistics |
| 0x40 | 01000000 | Indexed view |
| 0x68 | 01101000 | Trace Flag 9612 enabled, or three-or-more-column statistics |
| 0x76 | 01110110 | Temporal history table |
| 0xa2 | 10100010 | Incremental statistics with sampling |
| Value | Interpretation |
|---|---|
| 0x11 | Baseline statistics (initial creation) |
| 0x19 | Statistics after INSERT/UPDATE operations |
| 0x13 | CHAR type statistics |
| 0x1b | CHAR type after updates |
| 0x40 | Columnstore multi-column statistics |
| 0xb1 | Incremental statistics |
| 0xb9 | Incremental statistics with sampling |
The first offset table entry (entry 0) is a literal absolute byte offset in the standard case. The value at the start of the offset table is the byte offset of step 0. When the engine places histogram steps near the end of the blob, entry 0 is large (blob length minus the size of the step region and any trailing space); when steps are placed earlier in the blob, entry 0 is smaller. The feature flag at off1+324 determines the predicted table position (off1+340 or off1+356).
In some engine-written blobs, entry 0 (or other entries) at the predicted position do not match the actual step marker positions—e.g. entry 0 is out of bounds—so literal interpretation fails. A parser should then apply, in order: (1) Try the other table position (340 vs 356); if the literal table there is valid, use it. (2) Overlap-from-entry0: sequential or end-packed from the predicted table’s entry0, or a 200-step scan for known INT layouts. (3) For step_count = 1 only, table-end-relative fallback: step0 = tableEndExclusive + gap, where tableEndExclusive = tableStart + step_count×8; gap = 8 when feature_flag_324 = 0x11, else 16. The 4-byte marker at that offset must be INT-like (0x00000014 or 0x00140010).
| Type ID | SQL Type | Storage | Range High Key Size |
|---|---|---|---|
| 48 | TINYINT | 1 byte | 1 byte |
| 52 | SMALLINT | 2 bytes | 2 bytes |
| 56 | INT | 4 bytes | 4 bytes |
| 127 | BIGINT | 8 bytes | 8 bytes |
| 59 | REAL | 4 bytes | 4 bytes |
| 62 | FLOAT | 8 bytes | 8 bytes |
| 60 | MONEY | 8 bytes | 8 bytes |
| 122 | SMALLMONEY | 4 bytes | 4 bytes |
| 106 | DECIMAL/NUMERIC | Variable | Variable |
| 61 | DATETIME | 8 bytes | 8 bytes |
| 58 | SMALLDATETIME | 4 bytes | 4 bytes |
| 40 | DATE | 3 bytes | 3 bytes |
| 41 | TIME | Variable | Variable |
| 42 | DATETIME2 | Variable | Variable |
| 43 | DATETIMEOFFSET | Variable | Variable |
| 167 | VARCHAR | Variable | Variable |
| 175 | CHAR | Fixed | Fixed |
| 231 | NVARCHAR | Variable | Variable |
| 239 | NCHAR | Fixed | Fixed |
| 165 | VARBINARY | Variable | Variable |
| 173 | BINARY | Fixed | Fixed |
| 35 | TEXT | Variable | Variable |
| 99 | NTEXT | Variable | Variable |
| 34 | IMAGE | Variable | Variable |
C layout (read-only parsing). Little-endian. Use #include <stdint.h> for fixed-width types.
#pragma pack(push, 1)
/* Global header: first 64 bytes of the blob (bytes 0-63). */
typedef struct stats_blob_global_header {
uint32_t format_version; /* 0-3: Format signature; always 0x00000001 */
uint32_t vector_count; /* 4-7: Number of columns (1 for single-column stats) */
uint64_t reserved_8_15; /* 8-15: Must be zero */
uint32_t checksum; /* 16-19: Integrity field; validated by SQL Server */
uint32_t reserved_20_23; /* 20-23: Must be zero */
uint32_t stream_length_l1; /* 24-27: Total blob size in bytes */
uint32_t reserved_28_31; /* 28-31: Must be zero */
uint32_t stream_length_l2; /* 32-35: L1 - off1; size after core header start */
uint32_t reserved_36_39; /* 36-39: Must be zero */
uint32_t system_type_id; /* 40-43: First column's SQL Server type ID */
uint32_t offset_44_field; /* 44-47: Type-specific validation (off1-8 for INT, else type ID) */
uint32_t length; /* 48-51: Max byte length for first column's type */
uint32_t reserved_52_55; /* 52-55: Must be zero */
uint32_t offset_56_59; /* 56-59: String-type identifier; 0 or 0x04d00800/0x00d00800 */
uint32_t offset_60_63; /* 60-63: Bitmask (byte0=features: ascending, filtered, multi-col, etc.) */
uint32_t copy2; /* 64-67: Engine-specific; not 0x07 in fixtures */
uint32_t bytes_68_71; /* 68-71: Engine-specific; not L1; pattern (type<<8)|low */
uint32_t statistics_type_const; /* 72-75: 0x0000b3b0 single-col, 0x000a0004 multi, 0x0000b3a8 columnstore */
} stats_blob_global_header_t;
/* Vector descriptor: per-column type info; first overlaps global header at 40-51. */
typedef struct stats_blob_vector_descriptor {
uint8_t system_type_id; /* SQL Server type ID (e.g. 56=INT, 231=NVARCHAR) */
uint8_t user_type_id; /* User type; 0 for built-in */
uint16_t padding; /* Reserved; zero */
uint32_t offset_44_value; /* Type-specific validation value */
uint16_t length; /* Max storage length for type */
uint8_t precision; /* For decimal/numeric */
uint8_t scale; /* For decimal/numeric */
uint32_t collation_id; /* Optional; for string types */
} stats_blob_vector_descriptor_t;
/* Core header: starts at off1 = L1 - L2; offsets below are relative to core start. */
typedef struct stats_blob_core_header {
uint32_t constant_copy; /* +0..+3: 0x00000007 */
uint64_t update_timestamp; /* +4..+11: Last stats update (SQL Server datetime) */
uint64_t total_rows; /* +12..+19: Table row count at collection */
uint64_t rows_sampled; /* +20..+27: Rows sampled (FULLSCAN = total_rows) */
float density; /* +28..+31: Combined density */
float density_vector[33]; /* +32..+163: Prefix densities for multi-column */
uint32_t step_count_1; /* +164..+167: Number of histogram steps (max 200) */
uint32_t step_count_2; /* +168..+171: Duplicate of step count */
uint32_t vector_count; /* +172..+175: Same as global vector_count */
uint32_t step_size; /* +176..+179: Bytes per histogram step */
float avg_key_length; /* +180..+183: Average key length */
float unfiltered_rows; /* +184..+187: Base table rows for filtered stats */
uint32_t reserved_188_191; /* +188..+191: Zero */
float avg_key_length_vector[33]; /* +192..+323: Per-prefix average key length */
uint8_t feature_flag; /* +324: Feature byte (0x11 baseline, 0x19 updated, etc.) */
uint8_t format_metadata[16]; /* +325..+340: Byte0=0x04, 1-14=0, byte15=type/feature bitmask */
uint64_t histogram_length; /* +341..+348: Step region size (often step_count*(step_size+11)) */
uint64_t gap; /* +349..+356: Lifecycle (e.g. +8 initial, +80 after insert, +156 after update) */
/* Offset table follows at core + 357: array of int64, one per step, absolute blob offsets */
} stats_blob_core_header_t;
/* Fixed-length prefix of one histogram step; range_hi_key and trailer follow (variable). */
typedef struct stats_blob_histogram_step_prefix {
uint32_t marker; /* Step type; 0x10001100 */
float eq_rows; /* Rows equal to range_hi_key */
float range_rows; /* Rows in (prev_hi, this_hi) */
float avg_range_rows; /* Average rows per distinct value in range */
/* range_hi_key: variable, type-specific (section 7.3) */
/* trailer: 3 bytes 0x040000 */
} stats_blob_histogram_step_prefix_t;
#pragma pack(pop)
Notes:
stream_length_l1 - stream_length_l2; core header starts at byte off1, offset table at off1 + 357.range_hi_key (length depends on system type), then 3-byte trailer 0x040000.Document Version: 1.3
Last Updated: March 2025