DPIR:Data Propagation Intermediate Representation
Traditional data propagation generally use DXL(Data eXchange Language e.g. XML, YAML, JSON) as the data carrier.
This approach have two penalties:
- DXLs are very high-level, convenient for user to edit but Inconvenient for program to parse.
- There are too many DXLs exist, programs use different DXLs can't exchange data directly.
Here I introduce a better way for data propagation: design a intermediate representation for data propagation.
I name it DPIR(Data Propagating Intermediate Representation).
DPIR should be low-level enough for programs to access efficiently and expressive enough for transforming to and from DXLs with no information lose.
Preliminary Design of DPIR
String encoding is another problem for data propagation, to avoid this problem DPIR save strings in a individual file: StrFile.
Non-string data is saved in DataFile.
StrFile
StrFile's name form is: name.dpir-str-encode
.
Where name
is the source file's name and encode
is the string encoding used.
For example: DPIR from config.json
, it's StrFile's name should be config.dpir-str-encode
.
For the StrFile saved with UTF-8 it's name should be config.dpir-str-utf8
;
For the StrFile saved with UTF-16 LE it's name should be config.dpir-str-utf16le
;
For the StrFile saved with UTF-16 BE it's name should be config.dpir-str-utf16be
.
There can be multiple StrFiles's saved with different encodings in a single DPIR.
The strings in StrFile are terminated with \n
and accessed with index started from 1, the index 0 is used to represent empty string("").
Therefore the line number of a line is also the index of the string saved in this line.
DataFile
DataFile's name form is: name.dpir-data
where name
is source file's name.
There's a format description in GO-like syntax for DataFile:
// structure of whole file, fields aligned to block size。
type full_file struct{
fileHead file_head // fileHead holds some global metadata.
root array_table/map_tabl/list_table/struct_table/string/integer/float/complex
// field root is root of the whole data structure, it can be a array table, a map table, a list table, a struct table, a string, a integer, a float-point number, or a complex number.
// followed by subsequent tables(if root is a table).
}
//--------------------------------------------------
//*******************************************************************
type file_head struct{
blockSizeLog uint8 // specified blockSize:blockSize=1 << blockSizeLog
fileHeadSize uint8 // size of file_head, block unit.
majorVersion uint8 // major version.
minorVersion uint8 // minor version.
generalEncodeFlag byte // specified UTF and ASCII encoding, can be combined with '|'.
charSetFlag byte // specified char set.
encodeFormatFlag byte // specified encodings of char set specified by charSetFlag, can be combined with '|'.
rootTypeFlag byte // specified type of root
sonKeyTypeFlag byte // (when rootTypeFlag is TYPE_MAP)specified keyType of all entrys in root table.
sonValueTypeFlag byte // (when rootTypeFlag is TYPE_ARRAY or TYPE_MAP)specified valueType of all entrys in root table.
reserveFileHead [n]byte // reserve space, used to align file_head to block size.
}
// generalEncodeFlag:
const(
ASCII = 1<<iota
UTF8
UTF16LE
UTF16BE
UTF32LE
UTF32BE
)
// charSetFlag:
const(
CHINESE = 1+iota // Chinese char set.
JAPANESE // Japanese char set。
// and so on
)
// encodeFormatFlag:
const(
FLAG1 = 1<<iota
FLAG2
FLAG3
FLAG4
FLAG5
FLAG6
FLAG7
FLAG8
)
// For example:
// When charSetFlag is CHINESE, FLAG1 represent GBK, FLAG2 represent GB2312, FLAG3 represent GB18030 ...
// When charSetFlag is JAPANESE, FLAG1 represent Shift_JIS, FLAG2 represent EUC_JP, FLAG3 represent ISO-2022-JP ...
// typeFlag: for rootTypeFlag,keyTypeFlag,valueTypeFlag,sonKeyTypeFlag and sonValueTypeFlag.
const(
TYPE_ARRAY = iota // array type
TYPE_MAP // map type
TYPE_LIST // list type
TYPE_STRUCT // struct type
TYPE_STRING // string type
TYPE_INT8 // and so on...
TYPE_INT16
TYPE_INT32
TYPE_INT64
TYPE_INT128
TYPE_INT256
TYPE_UINT8
TYPE_UINT16
TYPE_UINT32
TYPE_UINT64
TYPE_UINT128
TYPE_UINT256
TYPE_FLOAT16
TYPE_FLOAT32
TYPE_FLOAT64
TYPE_FLOAT128
TYPE_FLOAT256
TYPE_FLOAT512
TYPE_COMPLEX32
TYPE_COMPLEX64
TYPE_COMPLEX128
TYPE_COMPLEX256
TYPE_COMPLEX512
TYPE_COMPLEX1024
)
//----------------------------------------------------------------------
//*******************************************************************
// array table:
type array_table struct{
entryNum uint16 // number of entrys in current table.
sonKeyTypeFlag byte // this field only exist when value of all entrys in current table is map_table(specified by father entry's sonValueType), this field specified keyType of son table's all entrys.
sonValueTypeFlag byte // this field only exist when value of all entrys in current table is map_table or array_table, this field specified valueType of son table's all entrys.
entrys [entryNum]array_entry // entrys of current table.
reserveTable [n]byte // reserved space, used to align current table to block size.
}
// array entry:
type array_entry struct{
value [valueSize]byte
// according to father entry's sonValueTypeFlag:
// if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
// if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
// if it's other type(e.g. int64, complex128), then value is the real value of this entry.
}
//---------------------------------------------------------------------------------------
//*******************************************************************************************
// map table:
type map_table struct{
entryNum uint16 // specified nuber of entrys in current table.
sonKeyTypeFlag byte // this field only exist when value of all entrys in current table is map_table(specified by father entry's sonValueType), this field specified keyType of son table's all entrys.
sonValueTypeFlag byte // this field only exist when value of all entrys in current table is map_table or array_table, this field specified valueType of son table's all entrys.
entrys [entryNum]map_entry // entrys of current table.
reserveTable [n]byte // reserved space, used to align current table to block size.
}
// map entry
type map_entry struct{
key [keySize]byte // current entry's key, type and size is specified by father entry's sonKeyTypeFlag;
// if father entry's sonKeyTypeFlag is TYPE_STRING, then key is uint16 type, specified the index of string in *StrFile*.
value [valueSize]byte // current entry's value, according to father entry's sonValueTypeFlag:
// if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
// if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
// if it's other type(e.g. int64, complex128), then value is the real value of this entry.
}
//-------------------------------------------------------------------------------------
//*************************************************************************************
// list table:
type list_table struct{
entryNum uint16 // specified nuber of entrys in current table.
entrys [entryNum]list_entry // entrys in current table.
exValues [n]byte // real values for types beyond size of 32bit(e.g. int64, complex128).
reserveTable [n]byte // reserved space, used to align current table to block size.
}
// list entry:
type list_entry struct{
valueTypeFlag byte // specified valueType of current entry.
sonKeyTypeFlag byte // this field only exist when valueTypeFlag is TYPE_MAP, this field specified keyType of son table's all entrys.
sonValueTypeFlag byte // this field only exist when valueTypeFlag is TYPE_MAP or TYPE_ARRAY, this field specified valueType of son table's all entrys.
value [n]byte // the "value" of current entry.
reserveEntry [n]byte // reserved space, used to align current entry to 5 bytes.
// according to valueTypeFlag:
// if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
// if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
// if it's number type below size of 32bit(int8-32, uint8-32, float32), then value is the real value of current entry.
// if it's number type beyond size of 32bit, then value is uint32 type, specified the offset of real value, relative to start of exValues, byte unit.
}
//---------------------------------------------------------------------------------------
//*******************************************************************************************
// struct table:
type struct_table struct{
entryNum byte // number of entrys in current table.
entrys [entryNum]struct_entry // entrys of current table.
exValues [n]byte // real values for types beyond size of 32bit(e.g. int64, complex128).
reserveTable [n]byte // reserved space, used to align current table to block size.
}
// struct entry:
type struct_entry struct{
valueTypeFlag byte // specified valueType of current entry.
key uint16 // index of string in *StrFile*.
sonKeyTypeFlag byte // this field only exist when valueTypeFlag is TYPE_MAP, this field specified keyType of son table's all entrys.
sonValueTypeFlag byte // this field only exist when valueTypeFlag is TYPE_MAP or TYPE_ARRAY, this field specified valueType of son table's all entrys.
value [n]byte // the "value" of current entry.
reserveEntry [n]byte // reserved space, used to align current entry to 7 bytes.
// according to valueTypeFlag:
// if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
// if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
// if it's number type below size of 32bit(int8-32, uint8-32, float32), then value is the real value of current entry.
// if it's number type beyond size of 32bit, then value is uint32 type, specified the offset of real value, relative to start of exValues, byte unit.
}
Top comments (0)