SQL Server Statistics Blob Internals: Technical White Paper
Document Version: 1.2
Date: March 5, 2025
Classification: Technical Reference
Subject: Internal Structure of SQL Server STATS_STREAM Binary Format
Document Type: Read-Only Format Specification
1. Scope
- Read-only structure of statistics blobs for parsing and analysis. Blob construction or modification is not in scope.
- SQL Server 2025 RTM: binary layout, header structures, metadata encoding, data type representations, histogram storage, validation, version-specific variations.
2. Overall Architecture
2.1 Structural Overview
Five principal regions:
- Global Header (fixed 64-byte structure)
- Vector Descriptors (variable-length, data-type-specific)
- Core Header (fixed-size metadata region)
- Histogram Offset Table (array of step pointers)
- Histogram Data (variable-length step records)
Sequential reading; random access to steps via offset table. Size: 64 bytes (empty) to several KB (200 steps).
2.2 Addressing Model
Addressing:
- Absolute addressing for global header fields (offsets from blob start)
- Relative addressing for core header fields (offsets from computed
off1 position)
- Indirect addressing for histogram steps (via offset table)
off1 = core header start = L1 - L2 (from global header).
2.3 Byte Order and Alignment
Little-endian. Natural alignment where possible. Reserved fields must be zero.
3. Global Header Structure
3.1 Purpose and Layout
First 64 bytes: format ID, sizes, data type descriptors, validation fields. Present in all blobs.
3.2 Header Fields
3.2.1 Format Version (Bytes 0-3)
Four-byte format version; value 0x00000001. Format signature for validation.
3.2.2 Vector Count (Bytes 4-7)
Four-byte number of vectors (columns). 1 = single-column; 2–32 = multi-column.
3.2.3 Reserved Regions (Bytes 8-15, 20-23, 28-31, 36-39)
Reserved regions (zeros). Non-zero causes rejection.
3.2.4 Checksum Field (Bytes 16-19)
Four-byte integrity validation field. SQL Server rejects blobs with incorrect values (Error 9105).
3.2.5 Stream Length L1 (Bytes 24-27)
Four-byte total blob length in bytes (all headers, vectors, histogram).
3.2.6 Stream Length L2 (Bytes 32-35)
Four-byte size of content after core header start. L2 = L1 - off1.
3.2.7 Data Type Fields (Bytes 40-51)
Overlaps first vector descriptor. Primary column data type:
- System Type ID (bytes 40-43): SQL Server internal type identifier
- Offset 44 Field (bytes 44-47): Data-type-specific validation value following pattern-based rules
- Length (bytes 48-51): Maximum byte length for the data type
Offset 44: integer types = off1 - 8; other types = system type ID.
3.2.8 Offset 56-59 Region
Four-byte string-type identifier. Zero for non-string types. For string types (VARCHAR 167, NVARCHAR 231, CHAR 175, NCHAR 239, TEXT 35, NTEXT 99), when the descriptor region does not overlap bytes 56–59: value 0x04d00800 (or 0x00d00800 for max/large length). Fixture analysis: all non-zero values in sampled blobs are 0x04d00800 and correspond to string-column statistics. Extract from original when non-zero if not string-type.
3.2.9 Offset 60-63 Bitmask Field
A four-byte validation field where byte 0 functions as a bitmask encoding various statistics modes and features:
Bit 0: Ascending key detection flag—set for ascending distributions, filtered statistics, and temporal tables
Bit 1: Distribution or memory-optimized flag—indicates special distribution handling or memory-optimized table statistics
Bit 2: Temporal history exclusive flag—set only for temporal history tables
Bit 3: Multi-purpose mode indicator—set for Trace Flag 9612 mode, ascending key mode, multi-column statistics, temporal tables, and persisted samples
Bit 4: Multi-column indicator—set for statistics with two or more columns, also for temporal history
Bit 5: Complex flag—set for multi-column statistics, TF9612 mode, temporal history, and combinations of persisted sample with filtered statistics
Bit 6: Advanced features flag—set for TF9612 mode, multi-column statistics with three or more columns, indexed views, columnstore indexes, and temporal history
Bit 7: Incremental statistics flag—set conditionally for incremental statistics created with sampling (not full scan)
Common bitmask patterns include 0x00 (baseline uniform distribution), 0x0b (ascending distribution), 0x68 (Trace Flag 9612 or multi-column with 3+ columns), 0x38 (two-column statistics), and 0x76 (temporal history tables).
3.2.10 Copy Fields (Bytes 64-75)
Three four-byte fields; bytes 72-75 are well-established; bytes 64-71 are engine-specific:
- Bytes 64-67: Engine-specific; never 0x00000007 in engine blobs. Typical layout: high 16 bits 0x0003; lower 16 bits type-dependent (e.g. 0x00033800 INT, 0x48037F00 BIGINT). The byte at +1 is 0x07 for auto-created column statistics (names ST_, WA_Sys) and equals SystemTypeId for index statistics (names PK_, FK_, ix_, UQ_).
- Bytes 68-71: Engine-specific; not a copy of L1. Stored little-endian as (type_byte << 8) | low_byte. High byte is usually SystemTypeId (datetime/decimal/float use 0x38). Low byte is context-dependent; no single formula.
- Bytes 72-75: Fixed constant by statistics kind: 0x0000b3b0 (single-column), 0x000a0004 (multi-column), 0x0000b3a8 (columnstore).
4. Vector Descriptors
4.1 Structure and Purpose
Vector descriptors provide detailed data type metadata for each column in the statistics. These variable-length structures begin at byte 40 of the global header and extend for 12 bytes per column (16 bytes if the type includes collation information). The first vector descriptor overlaps with global header bytes 40-51.
4.2 Descriptor Fields
Each vector descriptor contains:
- System Type ID (1 byte): SQL Server internal type identifier encoding the data type
- User Type ID (1 byte): User-defined type identifier (zero for built-in types)
- Padding (2 bytes): Reserved zeros
- Offset 44 Validation Field (4 bytes): Type-specific validation value
- Length (2 bytes): Maximum storage length for the data type
- Precision (1 byte): Numeric precision for decimal/numeric types
- Scale (1 byte): Numeric scale for decimal/numeric types
- Collation ID (4 bytes, optional): Collation identifier for string types
4.3 System Type Identifiers
Common system type IDs include:
- 56: INT (4-byte integer)
- 127: BIGINT (8-byte integer)
- 48: TINYINT (1-byte integer)
- 52: SMALLINT (2-byte integer)
- 167: VARCHAR (variable-length single-byte string)
- 231: NVARCHAR (variable-length Unicode string)
- 175: CHAR (fixed-length single-byte string)
- 239: NCHAR (fixed-length Unicode string)
- 61: DATETIME (8-byte datetime)
- 42: DATETIME2 (variable-length datetime with precision)
- 40: DATE (3-byte date)
- 41: TIME (variable-length time with precision)
- 43: DATETIMEOFFSET (variable-length datetime with timezone)
- 106: DECIMAL/NUMERIC (variable-length decimal)
- 62: FLOAT (8-byte double precision)
- 59: REAL (4-byte single precision)
- 60: MONEY (8-byte currency)
- 122: SMALLMONEY (4-byte currency)
- 165: VARBINARY (variable-length binary)
- 173: BINARY (fixed-length binary)
4.4 Multi-Column Considerations
Multi-column: vector descriptors repeat per column; count in global header. Padding to alignment before core header.
5. Core Header
5.1 Location and Size
Core header at off1 = L1 - L2; 357 bytes fixed layout.
5.2 Metadata Fields
5.2.1 Constant Copy (Bytes +0 to +3)
Four-byte constant. Global header 64-67 is not 0x07 in sampled blobs; core header +0 to +3 may be 0x00000007 in engine blobs—validate per implementation.
5.2.2 Update Timestamp (Bytes +4 to +11)
Eight-byte last statistics update time (SQL Server internal datetime).
5.2.3 Row Counts (Bytes +12 to +27)
Two eight-byte integer fields:
- Total Rows (bytes +12 to +19): Complete row count in the table at statistics collection time
- Rows Sampled (bytes +20 to +27): Number of rows examined during statistics collection (equals total rows for FULLSCAN, less for sampled statistics)
5.2.4 Density Information (Bytes +28 to +163)
A complex region containing density values used for selectivity estimation:
- Combined Density (bytes +28 to +31): Single four-byte float representing overall column selectivity
- Density Vector (bytes +32 to +163): Array of 33 four-byte float values providing prefix densities for multi-column statistics
Prefix densities for multi-column selectivity.
5.2.5 Histogram Metadata (Bytes +164 to +187)
Critical fields describing histogram structure:
- Step Count Copy 1 (bytes +164 to +167): Number of histogram steps (maximum 200)
- Step Count Copy 2 (bytes +168 to +171): Duplicate step count for validation
- Vector Count (bytes +172 to +175): Redundant copy of global header vector count
- Step Size (bytes +176 to +179): Byte size of each histogram step structure
- Average Key Length (bytes +180 to +183): Four-byte float indicating average key size
- Unfiltered Rows (bytes +184 to +187): Four-byte float for filtered statistics showing base table row count
5.2.6 Reserved Region (Bytes +188 to +191)
Four-byte reserved field containing zeros across all analyzed blobs.
5.2.7 Average Key Length Vector (Bytes +192 to +323)
Array of 33 four-byte float values providing average key lengths for each prefix in multi-column statistics.
5.2.8 Feature Flag (Byte +324)
Single-byte flag encoding statistics features. Common values include:
- 0x11 (17): Baseline statistics (30.4% of blobs)
- 0x19 (25): Most common value, often appears after INSERT/UPDATE operations (57.3%)
- 0x13 (19): CHAR type variant (5.1%)
- 0x1b (27): CHAR type with updates (6.6%)
- 0x40 (64): Columnstore multi-column statistics (0.1%)
- 0xb1 (177): Incremental statistics (0.2%)
- 0xb9 (185): Incremental statistics with sampling (0.4%)
Encodes data type, update status, features. Correlation: The bit at 0x02 predicts the histogram offset table position: (flag & 0x02) != 0 → table at off1+340; else → off1+356. See docs/FEATURE_FLAG_OFF1_324.md for definition and correlation summary.
5.2.9 Format Metadata Field (Bytes +325 to +340)
Sixteen-byte structured field with complex encoding:
Byte +325: Format identifier, constant 0x04 in 99.9% of blobs. Exception: empty columnstore statistics (step count and row count both zero) use 0x00. Note that non-columnstore empty statistics use a completely different format with only 64-byte blobs containing no core header.
Bytes +326 to +339: Reserved padding containing zeros.
Byte +340: Bitmask encoding system type base value and feature flags. The bitmask formula is base_value_for_system_type OR feature_bits, where:
System type base values vary by data type:
- Type 56 (INT): base
0x38 in 47% of cases
- Type 127 (BIGINT): base
0x58 or 0xf9 in approximately 46% each
- Type 106 (FLOAT): base
0x20 or 0x2c in approximately 48% each
Feature bits overlay the base value:
- Bits 0 and 2: Set for regular statistics (excluding multi-column, incremental, persisted sample, and norecompute options)
- Bit 1: Set for non-columnstore statistics
- Bits 3, 4, and 5: Set universally for multi-column, incremental, persisted sample, or norecompute statistics
- Bit 6: Set exclusively for string index statistics
- Bit 7: Set for non-columnstore statistics
Bits 1 and 7 both encode non-columnstore; exact separate meaning of each bit is not required for round-trip when using per-type observed values (e.g. INT 0x38, BIGINT 0x58, NVARCHAR 0x50). This byte cannot distinguish multi-column, incremental, persisted sample, norecompute when only one is set.
5.2.10 Histogram Length (Bytes +340 to +347) — when 16-byte block present
Only when the 16-byte metadata block is present (feature flag at +324 has bit 0x02 clear → offset table at off1+356): the eight bytes at off1+340..off1+347 are the histogram length — little-endian uint64 = total size of the histogram step region. Usually step_count × (step_size + 11) (11 = marker 4 + trailer 3 + overhead 4). Exceptions: IMAGE, NTEXT, TEXT.
When the block is absent (bit 0x02 set → table at off1+340), bytes 340–347 are the first eight bytes of the offset table (first step offset), not histogram length. See §6.2 and docs/FEATURE_FLAG_OFF1_324.md.
5.2.11 Gap Field (Bytes +348 to +355) — when 16-byte block present
Only when the 16-byte block is present (offset table at off1+356): the next eight bytes are lifecycle/metadata:
- off1+348 .. off1+351 (4 bytes): Gap value =
histogram_length + N, little-endian uint32. N ∈ {8, 80, 156, 232, 448, 509} (e.g. +8 initial, +80 after INSERT, +156 after UPDATE).
- off1+352 .. off1+355 (4 bytes): Zero.
When the block is absent (table at off1+340), bytes 348–355 are bytes 9–16 of the offset table (second step offset), not gap. See §6.2 and docs/FEATURE_FLAG_OFF1_324.md.
6. Histogram Offset Table
6.1 Purpose
Offset table: one entry per histogram step. Each entry is the absolute byte offset from blob start to that step (standard layout), or a relative offset in low-compression layouts.
6.2 Structure
One eight-byte little-endian integer per step. Size = step_count × 8 bytes (e.g. 1,600 for 200 steps).
The table starts at off1+340 or off1+356 only, as predicted by the feature flag at off1+324 (see implementation note below).
6.3 Addressing Scheme
Offset values are absolute positions within the blob. Each table entry is a literal absolute byte offset: the eight-byte value at table start + N × 8 is the byte offset of step N. To access step N, read that value and seek to that position. A large value (e.g. near blob length) indicates the step region is placed near the end of the blob; a small value indicates placement nearer the core header. No encoding or decode step is applied to any entry.
Observed gap: In some engine-written blobs, reading every entry as a literal step offset yields wrong step positions: e.g. table entry 0 may not equal the byte offset of the first step marker in the blob, or entry 0 may be out of bounds, so a parser that seeks to entry 0 reads the wrong bytes or fails. When entry 1, 2, … fall inside the table’s byte range, treating them as step starts causes the parser to read entry bytes as step data. The engine still produces the correct histogram (as shown by DBCC SHOW_STATISTICS WITH HISTOGRAM).
Parser fallback (decision order): (1) Read the offset table at the predicted position (off1+340 or off1+356 from the feature flag). (2) If any entry is out of range or overflow, try the other table position (340 vs 356); if the literal table there is valid, use it and resolve overlap if needed. (3) When literal at both positions fails, use overlap-from-entry0: sequential layout (stride = 4 + stepSize) from the first table entry at the predicted position, or end-packed layout (step block at end of blob) for INT (marker 0x00000014), or a 200-step scan for known INT blobs. (4) For step_count = 1 only, when literal and overlap both fail, use the table-end-relative rule: step0 = tableEndExclusive + gap, where tableEndExclusive = tableStart + step_count×8; gap = 8 when feature_flag_324 = 0x11, else 16. The 4-byte marker at that offset must be INT-like (0x00000014 or 0x00140010).
Step block after table: In many blobs the first step lies at a fixed offset after the offset table: step0Offset = tableEndExclusive + gap, with gap 8 (feature_flag_324 = 0x11) or 16 (0x19, 0x1b, etc.). The fallback in (4) above implements this for single-step histograms when earlier steps fail.
6.4 Special Cases
Single-step histograms may omit the offset table or use offset value zero to indicate inline storage. Empty histograms (step count zero) result in no offset table presence.
Header step_count (+164, +168) can exceed the number of physically stored or decodable steps. Engine may reserve space or use a compressed representation. Decoded step count may be less than header step_count; layout logic must choose header value or decoded count.
Implementation note (offset table use). The feature flag at off1+324 determines the predicted table position: if (flag & 0x02) != 0 (e.g. 0x13, 0x1b), the 16-byte histogram-length and gap block is absent and the table starts at off1+340; if the bit is clear (e.g. 0x11, 0x19), the block is present and the table starts at off1+356. The table is only at one of these two positions. Parsers should try the predicted position first; if reading the table there yields out-of-bounds or invalid entries, try the other position (340 vs 356) before applying overlap or table-end-relative fallbacks. For step_count = 1, when both positions and overlap fail, use step0 = tableEndExclusive + gap (gap = 8 for flag 0x11, else 16).
6.5 Table data space (entry array and step payloads)
The data space of the offset table is the logical region the engine uses for histogram table data. It comprises:
- Entry array: The 8-byte-per-step block starting at off1+340 or off1+356 (size = step_count × 8). Each entry is the literal absolute byte offset of that step’s start.
- Step payloads: Step data (eq/range/avg, range high key, trailer) is written at those absolute offsets. The engine may place step payload bytes anywhere, including in the same byte range as the entry array. So that byte range can hold both entry bytes and step payload bytes (dual-use). What gets written there is table data.
A parser that reads the table region only as contiguous 8-byte entries may fail to re-parse after in-place scaling when the engine has stored step payload in that range (overlap). Scaling tools should validate that no step payload overlaps the table byte range before overwriting, or use the same fallback order as above to re-locate steps.
7. Histogram Steps
7.1 Overview
Histogram steps contain the statistical distribution data used for cardinality estimation. Each step represents a value range with associated row counts. SQL Server maintains a maximum of 200 histogram steps per statistics object.
7.2 Step Structure
Each histogram step contains:
- Marker (4 bytes): Step type identifier. Commonly documented as
0x10001100; in type-specific parsing the INT step marker is also observed as 0x00000014 (20 decimal, matching the step data size). Other types use type-specific marker values (e.g. BIGINT 0x00000018).
- Equal Rows (4 bytes): Single-precision float indicating estimated rows equal to the range high key
- Range Rows (4 bytes): Single-precision float indicating estimated rows greater than previous step's range high key and less than this step's range high key
- Average Range Rows (4 bytes): Single-precision float representing average rows per distinct value within the range
- Range High Key (variable length): Upper bound value for this step in native data type format
- Trailer (3 bytes): Step terminator, consistently
0x040000
Total step size varies by data type due to variable-length range high key encoding.
7.3 Range High Key Encoding
Range high key values use data-type-specific binary representations:
7.3.1 Integer Types
Integer types store range high keys as native little-endian integers:
- INT: 4-byte signed integer
- BIGINT: 8-byte signed integer
- SMALLINT: 2-byte signed integer
- TINYINT: 1-byte unsigned integer
7.3.2 Floating-Point Types
Floating-point types use IEEE 754 format:
- FLOAT: 8-byte double precision
- REAL: 4-byte single precision
Special handling is required for denormalized values and special values (infinity, NaN).
7.3.3 String Types
String types employ variable-length encoding:
- VARCHAR: Two-byte length prefix followed by single-byte characters
- NVARCHAR: Two-byte length prefix followed by UTF-16 characters
- CHAR: Fixed-length representation with padding
- NCHAR: Fixed-length UTF-16 with padding
Length prefixes indicate character count for variable-length types.
7.3.4 Date and Time Types
Temporal types use SQL Server's internal representations:
- DATETIME: 8-byte structure (4 bytes for days since 1900-01-01, 4 bytes for time fraction)
- DATETIME2: Variable-length based on precision (3 to 8 bytes)
- DATE: 3-byte integer representing days since base date
- TIME: Variable-length based on precision (3 to 5 bytes)
- DATETIMEOFFSET: Variable-length including timezone offset (8 to 10 bytes)
- SMALLDATETIME: 4-byte structure with reduced precision
7.3.5 Decimal and Numeric Types
DECIMAL and NUMERIC types use variable-length signed integer storage with implied scale. Storage size depends on precision, ranging from 5 to 17 bytes.
7.3.6 Binary Types
Binary types mirror their string counterparts:
- VARBINARY: Two-byte length prefix followed by binary data
- BINARY: Fixed-length representation
7.4 Step Ordering
Histogram steps must be ordered by ascending range high key values. The first step represents all values less than or equal to its range high key. Subsequent steps represent ranges between the previous step's range high key (exclusive) and the current step's range high key (inclusive).
7.5 NULL Representation
NULL values are not represented as histogram steps. NULL statistics are maintained separately through density vector calculations. If a histogram step's range high key would be NULL, that step is omitted from the blob.
8. Validation and Integrity
8.1 Critical Validation Fields
SQL Server performs strict validation on several fields during statistics blob consumption:
8.1.1 Checksum (Bytes 16-19)
Checksum at global header 16-19 is validated. Incorrect values cause Error 9105.
8.1.2 Reserved Fields
All reserved fields containing zeros are validated. Non-zero values in these regions result in blob rejection:
- Global header bytes 8-15
- Global header bytes 20-23
- Global header bytes 28-31
- Global header bytes 36-39
- Global header bytes 52-55 (partially)
- Core header bytes +188 to +191
- Core header bytes +352 to +355
8.1.3 Redundant Copies
Fields with redundant copies must match exactly:
- Step count copy 1 and copy 2 (core header +164-167 and +168-171)
- Stream length L1 (global header 24-27); bytes 68-71 engine-specific
- Constant copies (global header 64-67 engine-specific; core header +0-3)
- Vector count copies (global header 4-7 and core header +172-175)
8.1.4 Offset 44 Validation
Offset 44 must match:
- For INT type: equals
off1 - 8
- For other types: equals system type ID
Mismatch causes rejection.
8.2 Length Consistency
Multiple length relationships must hold:
L2 = L1 - off1
- Total blob size equals L1
- Histogram length matches step count and step size calculations
- Offset table entries point within blob boundaries
8.3 Structural Integrity
SQL Server validates structural properties:
- All offsets resolve to valid addresses within blob bounds
- Histogram steps are ordered correctly
- Step count does not exceed 200
- Data type encodings match vector descriptor specifications
9. Special Cases and Variations
9.1 Empty Statistics
Empty statistics (zero rows sampled) employ a simplified format:
Non-columnstore empty statistics: Blob contains only the 64-byte global header with no core header or histogram data. Total size is exactly 64 bytes.
Columnstore empty statistics: Blob contains global header and core header (with core header +325 set to 0x00 instead of 0x04), but no histogram data. Total size exceeds 64 bytes but contains no histogram steps.
9.2 Single-Step Histograms
Histograms with exactly one step may optimize storage by omitting the offset table or using inline storage. The offset table entry may contain zero to indicate the step immediately follows the offset table region.
9.3 Filtered Statistics
Filtered statistics (created with WHERE predicates) set specific flags in the bitmask at offset 60-63. The unfiltered rows field (core header +184-187) contains the base table row count, while the rows field contains filtered row count.
9.4 Multi-Column Statistics
Multi-column statistics contain multiple vector descriptors, extended density vectors, and modified histogram structures. The global header offset 60-63 bitmask encodes the column count, with different patterns for two-column versus three-or-more-column statistics.
9.5 Incremental Statistics
Incremental statistics (partitioned table statistics) set bit 7 in the offset 60-63 bitmask when created with sampling. The feature flag at core header +324 uses values 0xb1 or 0xb9 to indicate incremental mode.
9.6 Trace Flag Impact
Trace Flag 9612 affects offset 60-63 bitmask; when enabled, often 0x0b or 0x68.
9.7 Temporal Tables
Temporal tables (system-versioned tables with history) use distinct bitmask patterns:
- Regular temporal tables: offset 60-63 typically equals 0x0b
- Temporal history tables: offset 60-63 typically equals 0x76
- Bit 2 in the bitmask is set exclusively for temporal history tables
9.8 Memory-Optimized Tables
Statistics on memory-optimized tables set bit 1 in the offset 60-63 bitmask, typically resulting in value 0x02 or 0x0b depending on distribution characteristics.
9.9 Columnstore Indexes
Columnstore index statistics use specific constants and bitmask patterns:
- Offset 72-75 constant: 0x0000b3a8 instead of 0x0000b3b0
- Core header +324 feature flag: 0x40 for multi-column columnstore
- Empty columnstore: core header +325 uses 0x00 instead of 0x04
9.10 Indexed Views
Statistics on indexed views may set bit 6 in the offset 60-63 bitmask (value 0x40), though this behavior depends on view complexity and column count.
10. Version Information
10.1 Analysis Version
SQL Server 2025 RTM. All structure descriptions, field layouts, and patterns in this document reflect that version.
10.2 Format Stability
Format version (bytes 0-3) is 0x00000001 in analyzed blobs. Compatibility with earlier versions (2012-2022) is not confirmed.
10.3 Version Variability
Feature flags, bitmask patterns, and validation requirements may vary between SQL Server versions.
10.4 Version-Specific Features
Features present in SQL Server 2025 RTM:
- Incremental statistics support
- Columnstore index statistics
- Memory-optimized table statistics
- Temporal table statistics (system-versioned tables)
- Filtered statistics with WHERE predicates
- Multi-column statistics (up to 32 columns)
- Persisted sample percentage option
10.5 Compatibility Considerations
Blobs from other SQL Server versions:
- Field offsets and basic structure may be consistent
- Feature flag values and bitmask patterns may differ
- Validation requirements may be version-specific
- Always verify format details against the specific SQL Server version in use
11. Variable-Length and Resolved Observations
11.1 Resolved (fixture analysis)
- Offset 56-59: String-type identifier. Non-zero only for string column statistics; value
0x04d00800 (or 0x00d00800 for max/large). Documented in 3.2.8.
- Core header +325-340 byte 15 (bits 1 and 7): Byte 15 is type-dependent (base value per system type); bits 1 and 7 are part of that encoding (e.g. non-columnstore). Per-type values suffice for round-trip; individual bit semantics not required. Documented in 5.2.9.
- Deprecated types (IMAGE, NTEXT, TEXT): No fixtures in set; builder uses type IDs 34, 35, 99 and type-specific offset_44 and byte 15. Histogram layout not verified against engine blobs.
11.2 Bytes 64–71 (global header)
Bytes 64–71 (global header):
- 64–67: High 16 bits often 0x0003; second byte is type byte (SystemTypeId, or 56 for datetime/decimal/float). When L1 < 65536 the engine often writes 0x07 in the second byte for auto-created column statistics; first byte can correlate with (L1>>24)&0xFF when L1 ≥ 256. 64–67 are validated by the engine (mutant rejected).
- 68–71: Never equal to L1. Layout (LE): (type_byte << 8) | low_byte. High byte = type byte; low byte = (L1>>16)&0xFF when L1 ≥ 65536, with other rules for filtered stats (255), step-count bands, and type (e.g. Float → 7). 68–71 are overwritten by the engine when zeroed.
11.3 Variable-length (string) histogram structure
- Fixed-length (e.g. INT, BIGINT): step layout = marker (4) | equal rows, range rows, average range rows (12 bytes) | range high key | distinctRangeRows (4) | trailer (3). Offset table points to step start (marker or float region).
- Variable-length (VARCHAR, NVARCHAR): offset table may point at the range high key (KEY data) rather than at the marker. The three floats lie 12 bytes before that key. Order is floats then key; addressing is key-centric. Implementations may use marker/key detection and an offset 12 bytes earlier to decode steps.
- Core header +325 bit 6 is set exclusively for string index statistics.
Appendix A: Field Parsing Reference
Constant Fields (Predictable Values)
Constant or highly predictable:
- Format version (bytes 0-3): constant 0x00000001 across all versions
- Copy 2 (bytes 64-67): engine-specific; not 0x00000007 in sampled fixtures
- Statistics type constant (bytes 72-75): type-specific constant (0x0000b3b0, 0x000a0004, or 0x0000b3a8)
- Core header constant (core +0-3): often 0x00000007; validate per implementation
- Core header reserved (core +188-191): zeros
- Core header padding (core +352-355): zeros
- Core header format field bytes 0-14 (core +325-339): byte 0 = 0x04 (or 0x00 for empty columnstore), bytes 1-14 = zeros
Derived Fields (Calculated from Other Fields)
Derived from other blob contents:
- Stream length L2 (bytes 32-35): equals L1 - off1
- Bytes 68-71: engine-specific; not L1 in sampled fixtures; pattern (type_byte<<8)|low_byte
- Core offset (off1): calculated as L1 - L2
- Histogram length (core +340-347): typically step_count × (step_size + 11), with exceptions for deprecated types
Complex Fields (Require Pattern Recognition)
Complex state (pattern recognition):
- Checksum (bytes 16-19): four-byte validation field (algorithm not publicly documented)
- Offset 44 (bytes 44-47): type-specific validation value (off1 - 8 for INT, system_type_id for others)
- Offset 60-63: bitmask encoding statistics features and modes
- Core header +324: feature flag encoding data type and update status
- Core header +340 (byte 15): bitmask combining system type base value and feature flags
- Gap field (core +348-351): histogram_length + update_offset encoding lifecycle state
Variable Content Fields
Data-dependent content:
- Vector count (bytes 4-7): number of columns in statistics
- Stream length L1 (bytes 24-27): total blob size
- Offset 56-59: string-type identifier; 0 for non-string, 0x04d00800 (or 0x00d00800) for string types when not overlapping descriptor
- Update timestamp (core +4-11): last statistics update time
- Row counts (core +12-27): table and sample row counts
- Density information (core +28-163): density vector values
- Histogram metadata (core +164-187): step count, step size, average key length
- Average key length vector (core +192-323): multi-column key length data
Appendix B: Common Bitmask Values
Offset 60-63 Byte 0 Patterns
| Value |
Binary |
Interpretation |
| 0x00 |
00000000 |
Baseline uniform distribution, single-column, no special features |
| 0x01 |
00000001 |
Filtered statistics |
| 0x02 |
00000010 |
Memory-optimized table |
| 0x0a |
00001010 |
Persisted sample |
| 0x0b |
00001011 |
Ascending distribution or temporal table |
| 0x38 |
00111000 |
Two-column statistics |
| 0x40 |
01000000 |
Indexed view |
| 0x68 |
01101000 |
Trace Flag 9612 enabled, or three-or-more-column statistics |
| 0x76 |
01110110 |
Temporal history table |
| 0xa2 |
10100010 |
Incremental statistics with sampling |
Core Header +324 Feature Flag Values
| Value |
Interpretation |
| 0x11 |
Baseline statistics (initial creation) |
| 0x19 |
Statistics after INSERT/UPDATE operations |
| 0x13 |
CHAR type statistics |
| 0x1b |
CHAR type after updates |
| 0x40 |
Columnstore multi-column statistics |
| 0xb1 |
Incremental statistics |
| 0xb9 |
Incremental statistics with sampling |
Appendix C: Offset table entry 0 and parser decision order
The first offset table entry (entry 0) is a literal absolute byte offset in the standard case. The value at the start of the offset table is the byte offset of step 0. When the engine places histogram steps near the end of the blob, entry 0 is large (blob length minus the size of the step region and any trailing space); when steps are placed earlier in the blob, entry 0 is smaller. The feature flag at off1+324 determines the predicted table position (off1+340 or off1+356).
In some engine-written blobs, entry 0 (or other entries) at the predicted position do not match the actual step marker positions—e.g. entry 0 is out of bounds—so literal interpretation fails. A parser should then apply, in order: (1) Try the other table position (340 vs 356); if the literal table there is valid, use it. (2) Overlap-from-entry0: sequential or end-packed from the predicted table’s entry0, or a 200-step scan for known INT layouts. (3) For step_count = 1 only, table-end-relative fallback: step0 = tableEndExclusive + gap, where tableEndExclusive = tableStart + step_count×8; gap = 8 when feature_flag_324 = 0x11, else 16. The 4-byte marker at that offset must be INT-like (0x00000014 or 0x00140010).
Appendix D: System Type ID Reference
| Type ID |
SQL Type |
Storage |
Range High Key Size |
| 48 |
TINYINT |
1 byte |
1 byte |
| 52 |
SMALLINT |
2 bytes |
2 bytes |
| 56 |
INT |
4 bytes |
4 bytes |
| 127 |
BIGINT |
8 bytes |
8 bytes |
| 59 |
REAL |
4 bytes |
4 bytes |
| 62 |
FLOAT |
8 bytes |
8 bytes |
| 60 |
MONEY |
8 bytes |
8 bytes |
| 122 |
SMALLMONEY |
4 bytes |
4 bytes |
| 106 |
DECIMAL/NUMERIC |
Variable |
Variable |
| 61 |
DATETIME |
8 bytes |
8 bytes |
| 58 |
SMALLDATETIME |
4 bytes |
4 bytes |
| 40 |
DATE |
3 bytes |
3 bytes |
| 41 |
TIME |
Variable |
Variable |
| 42 |
DATETIME2 |
Variable |
Variable |
| 43 |
DATETIMEOFFSET |
Variable |
Variable |
| 167 |
VARCHAR |
Variable |
Variable |
| 175 |
CHAR |
Fixed |
Fixed |
| 231 |
NVARCHAR |
Variable |
Variable |
| 239 |
NCHAR |
Fixed |
Fixed |
| 165 |
VARBINARY |
Variable |
Variable |
| 173 |
BINARY |
Fixed |
Fixed |
| 35 |
TEXT |
Variable |
Variable |
| 99 |
NTEXT |
Variable |
Variable |
| 34 |
IMAGE |
Variable |
Variable |
Appendix E: C Structure Definitions
C layout (read-only parsing). Little-endian. Use #include <stdint.h> for fixed-width types.
#pragma pack(push, 1)
/* Global header: first 64 bytes of the blob (bytes 0-63). */
typedef struct stats_blob_global_header {
uint32_t format_version; /* 0-3: Format signature; always 0x00000001 */
uint32_t vector_count; /* 4-7: Number of columns (1 for single-column stats) */
uint64_t reserved_8_15; /* 8-15: Must be zero */
uint32_t checksum; /* 16-19: Integrity field; validated by SQL Server */
uint32_t reserved_20_23; /* 20-23: Must be zero */
uint32_t stream_length_l1; /* 24-27: Total blob size in bytes */
uint32_t reserved_28_31; /* 28-31: Must be zero */
uint32_t stream_length_l2; /* 32-35: L1 - off1; size after core header start */
uint32_t reserved_36_39; /* 36-39: Must be zero */
uint32_t system_type_id; /* 40-43: First column's SQL Server type ID */
uint32_t offset_44_field; /* 44-47: Type-specific validation (off1-8 for INT, else type ID) */
uint32_t length; /* 48-51: Max byte length for first column's type */
uint32_t reserved_52_55; /* 52-55: Must be zero */
uint32_t offset_56_59; /* 56-59: String-type identifier; 0 or 0x04d00800/0x00d00800 */
uint32_t offset_60_63; /* 60-63: Bitmask (byte0=features: ascending, filtered, multi-col, etc.) */
uint32_t copy2; /* 64-67: Engine-specific; not 0x07 in fixtures */
uint32_t bytes_68_71; /* 68-71: Engine-specific; not L1; pattern (type<<8)|low */
uint32_t statistics_type_const; /* 72-75: 0x0000b3b0 single-col, 0x000a0004 multi, 0x0000b3a8 columnstore */
} stats_blob_global_header_t;
/* Vector descriptor: per-column type info; first overlaps global header at 40-51. */
typedef struct stats_blob_vector_descriptor {
uint8_t system_type_id; /* SQL Server type ID (e.g. 56=INT, 231=NVARCHAR) */
uint8_t user_type_id; /* User type; 0 for built-in */
uint16_t padding; /* Reserved; zero */
uint32_t offset_44_value; /* Type-specific validation value */
uint16_t length; /* Max storage length for type */
uint8_t precision; /* For decimal/numeric */
uint8_t scale; /* For decimal/numeric */
uint32_t collation_id; /* Optional; for string types */
} stats_blob_vector_descriptor_t;
/* Core header: starts at off1 = L1 - L2; offsets below are relative to core start. */
typedef struct stats_blob_core_header {
uint32_t constant_copy; /* +0..+3: 0x00000007 */
uint64_t update_timestamp; /* +4..+11: Last stats update (SQL Server datetime) */
uint64_t total_rows; /* +12..+19: Table row count at collection */
uint64_t rows_sampled; /* +20..+27: Rows sampled (FULLSCAN = total_rows) */
float density; /* +28..+31: Combined density */
float density_vector[33]; /* +32..+163: Prefix densities for multi-column */
uint32_t step_count_1; /* +164..+167: Number of histogram steps (max 200) */
uint32_t step_count_2; /* +168..+171: Duplicate of step count */
uint32_t vector_count; /* +172..+175: Same as global vector_count */
uint32_t step_size; /* +176..+179: Bytes per histogram step */
float avg_key_length; /* +180..+183: Average key length */
float unfiltered_rows; /* +184..+187: Base table rows for filtered stats */
uint32_t reserved_188_191; /* +188..+191: Zero */
float avg_key_length_vector[33]; /* +192..+323: Per-prefix average key length */
uint8_t feature_flag; /* +324: Feature byte (0x11 baseline, 0x19 updated, etc.) */
uint8_t format_metadata[16]; /* +325..+340: Byte0=0x04, 1-14=0, byte15=type/feature bitmask */
uint64_t histogram_length; /* +341..+348: Step region size (often step_count*(step_size+11)) */
uint64_t gap; /* +349..+356: Lifecycle (e.g. +8 initial, +80 after insert, +156 after update) */
/* Offset table follows at core + 357: array of int64, one per step, absolute blob offsets */
} stats_blob_core_header_t;
/* Fixed-length prefix of one histogram step; range_hi_key and trailer follow (variable). */
typedef struct stats_blob_histogram_step_prefix {
uint32_t marker; /* Step type; 0x10001100 */
float eq_rows; /* Rows equal to range_hi_key */
float range_rows; /* Rows in (prev_hi, this_hi) */
float avg_range_rows; /* Average rows per distinct value in range */
/* range_hi_key: variable, type-specific (section 7.3) */
/* trailer: 3 bytes 0x040000 */
} stats_blob_histogram_step_prefix_t;
#pragma pack(pop)
Notes:
- off1 =
stream_length_l1 - stream_length_l2; core header starts at byte off1, offset table at off1 + 357.
- Histogram steps are reached via the offset table; each entry is an 8-byte absolute offset into the blob. Step layout: prefix above, then
range_hi_key (length depends on system type), then 3-byte trailer 0x040000.
- Reserved/padding zero required. Checksum (16-19): CRC-32 (poly 0xEDB88320) over blob with 16-19 zeroed, then XOR 0xFFFFFFFF and NOT.
Document Version: 1.3
Last Updated: March 2025