DEV Community

Cover image for DPIR: Data Propagating Intermediate Representation
Plume
Plume

Posted on • Edited on

DPIR: Data Propagating Intermediate Representation

DPIR:Data Propagation Intermediate Representation

Traditional data propagation generally use DXL(Data eXchange Language e.g. XML, YAML, JSON) as the data carrier.

This approach have two penalties:

  1. DXLs are very high-level, convenient for user to edit but Inconvenient for program to parse.
  2. There are too many DXLs exist, programs use different DXLs can't exchange data directly.

Here I introduce a better way for data propagation: design a intermediate representation for data propagation.
I name it DPIR(Data Propagating Intermediate Representation).

DPIR should be low-level enough for programs to access efficiently and expressive enough for transforming to and from DXLs with no information lose.

Preliminary Design of DPIR

String encoding is another problem for data propagation, to avoid this problem DPIR save strings in a individual file: StrFile.
Non-string data is saved in DataFile.

StrFile

StrFile's name form is: name.dpir-str-encode.
Where name is the source file's name and encode is the string encoding used.

For example: DPIR from config.json, it's StrFile's name should be config.dpir-str-encode.

For the StrFile saved with UTF-8 it's name should be config.dpir-str-utf8;

For the StrFile saved with UTF-16 LE it's name should be config.dpir-str-utf16le;

For the StrFile saved with UTF-16 BE it's name should be config.dpir-str-utf16be.

There can be multiple StrFiles's saved with different encodings in a single DPIR.

The strings in StrFile are terminated with \n and accessed with index started from 1, the index 0 is used to represent empty string("").
Therefore the line number of a line is also the index of the string saved in this line.

DataFile

DataFile's name form is: name.dpir-data where name is source file's name.

There's a format description in GO-like syntax for DataFile:
// structure of whole file, fields aligned to block size。
type full_file struct{
  fileHead    file_head      // fileHead holds some global metadata.
  root        array_table/map_tabl/list_table/struct_table/string/integer/float/complex
  // field root is root of the whole data structure, it can be a array table, a map table, a list table, a struct table, a string, a integer, a float-point number, or a complex number.

  // followed by subsequent tables(if root is a table).
}
//--------------------------------------------------









//*******************************************************************
type file_head struct{
  blockSizeLog        uint8    // specified blockSize:blockSize=1 << blockSizeLog
  fileHeadSize        uint8    // size of file_head, block unit.
  majorVersion        uint8    // major version.
  minorVersion        uint8    // minor version.
  generalEncodeFlag   byte    // specified UTF and ASCII encoding, can be combined with '|'.
  charSetFlag         byte    // specified char set.
  encodeFormatFlag    byte    // specified encodings of char set specified by charSetFlag, can be combined with '|'.
  rootTypeFlag        byte    // specified type of root
  sonKeyTypeFlag      byte    // (when rootTypeFlag is TYPE_MAP)specified keyType of all entrys in root table.
  sonValueTypeFlag    byte    // (when rootTypeFlag is TYPE_ARRAY or TYPE_MAP)specified valueType of all entrys in root table.
  reserveFileHead     [n]byte  // reserve space, used to align file_head to block size.
}
// generalEncodeFlag:
const(
  ASCII      =  1<<iota
  UTF8
  UTF16LE
  UTF16BE
  UTF32LE
  UTF32BE
)


// charSetFlag:
const(
  CHINESE      = 1+iota        // Chinese char set.
  JAPANESE                    //  Japanese char set。
  // and so on
)


// encodeFormatFlag:
const(
  FLAG1    =  1<<iota  
  FLAG2
  FLAG3
  FLAG4
  FLAG5
  FLAG6
  FLAG7
  FLAG8
)
// For example:
// When charSetFlag is CHINESE, FLAG1 represent GBK, FLAG2 represent GB2312, FLAG3 represent GB18030 ...
// When charSetFlag is JAPANESE, FLAG1 represent Shift_JIS, FLAG2 represent EUC_JP, FLAG3 represent ISO-2022-JP ...





// typeFlag: for rootTypeFlag,keyTypeFlag,valueTypeFlag,sonKeyTypeFlag and sonValueTypeFlag.
const(
  TYPE_ARRAY      =  iota     // array type
  TYPE_MAP                    // map type
  TYPE_LIST                   // list type
  TYPE_STRUCT                 // struct type
  TYPE_STRING                 // string type
  TYPE_INT8                   // and so on...
  TYPE_INT16
  TYPE_INT32
  TYPE_INT64
  TYPE_INT128
  TYPE_INT256
  TYPE_UINT8
  TYPE_UINT16
  TYPE_UINT32
  TYPE_UINT64
  TYPE_UINT128
  TYPE_UINT256
  TYPE_FLOAT16
  TYPE_FLOAT32
  TYPE_FLOAT64
  TYPE_FLOAT128
  TYPE_FLOAT256
  TYPE_FLOAT512
  TYPE_COMPLEX32
  TYPE_COMPLEX64
  TYPE_COMPLEX128
  TYPE_COMPLEX256
  TYPE_COMPLEX512
  TYPE_COMPLEX1024
)
//----------------------------------------------------------------------










//*******************************************************************
// array table:
type array_table  struct{
  entryNum            uint16   // number of entrys in current table.
  sonKeyTypeFlag      byte     // this field only exist when value of all entrys in current table is map_table(specified by father entry's sonValueType), this field specified keyType of son table's all entrys.
  sonValueTypeFlag    byte     // this field only exist when value of all entrys in current table is map_table or array_table, this field specified valueType of son table's all entrys.
  entrys              [entryNum]array_entry    // entrys of current table.
  reserveTable        [n]byte  // reserved space, used to align current table to block size.
}


// array entry:
type  array_entry    struct{
  value  [valueSize]byte
  // according to father entry's sonValueTypeFlag:
  // if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
  // if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
  // if it's other type(e.g. int64, complex128), then value is the real value of this entry.
}
//---------------------------------------------------------------------------------------









//*******************************************************************************************
// map table:
type  map_table  struct{
  entryNum           uint16            // specified nuber of entrys in current table.
  sonKeyTypeFlag     byte              // this field only exist when value of all entrys in current table is map_table(specified by father entry's sonValueType), this field specified keyType of son table's all entrys.
  sonValueTypeFlag   byte              // this field only exist when value of all entrys in current table is map_table or array_table, this field specified valueType of son table's all entrys.
  entrys             [entryNum]map_entry      // entrys of current table.
  reserveTable       [n]byte            // reserved space, used to align current table to block size.
}


// map entry
type  map_entry  struct{
  key      [keySize]byte          // current entry's key, type and size is specified by father entry's sonKeyTypeFlag; 
  // if father entry's sonKeyTypeFlag is TYPE_STRING, then key is uint16 type, specified the index of string in *StrFile*.

  value    [valueSize]byte        // current entry's value, according to father entry's sonValueTypeFlag:
  // if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
  // if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
  // if it's other type(e.g. int64, complex128), then value is the real value of this entry.
}
//-------------------------------------------------------------------------------------









//*************************************************************************************
// list table:
type list_table  struct{
  entryNum      uint16      // specified nuber of entrys in current table.
  entrys        [entryNum]list_entry  // entrys in current table.
  exValues      [n]byte    // real values for types beyond size of 32bit(e.g. int64, complex128).
  reserveTable  [n]byte    // reserved space, used to align current table to block size.
}


// list entry:
type  list_entry    struct{
  valueTypeFlag     byte      // specified valueType of current entry.
  sonKeyTypeFlag    byte      // this field only exist when valueTypeFlag is TYPE_MAP, this field specified keyType of son table's all entrys.
  sonValueTypeFlag  byte      // this field only exist when valueTypeFlag is TYPE_MAP or TYPE_ARRAY, this field specified valueType of son table's all entrys.
  value             [n]byte    // the "value" of current entry.
  reserveEntry      [n]byte    // reserved space, used to align current entry to 5 bytes.
  // according to valueTypeFlag:
  // if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
  // if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
  // if it's number type below size of 32bit(int8-32, uint8-32, float32), then value is the real value of current entry.
  // if it's number type beyond size of 32bit, then value is uint32 type, specified the offset of real value, relative to start of exValues, byte unit.
}
//---------------------------------------------------------------------------------------









//*******************************************************************************************
// struct table:
type  struct_table  struct{
  entryNum      byte    // number of entrys in current table.
  entrys        [entryNum]struct_entry  // entrys of current table.
  exValues      [n]byte  // real values for types beyond size of 32bit(e.g. int64, complex128).
  reserveTable  [n]byte  // reserved space, used to align current table to block size.
}


// struct entry:
type  struct_entry  struct{
  valueTypeFlag     byte      // specified valueType of current entry.
  key               uint16    // index of string in *StrFile*.
  sonKeyTypeFlag    byte      // this field only exist when valueTypeFlag is TYPE_MAP, this field specified keyType of son table's all entrys.
  sonValueTypeFlag  byte      // this field only exist when valueTypeFlag is TYPE_MAP or TYPE_ARRAY, this field specified valueType of son table's all entrys.
  value             [n]byte    // the "value" of current entry.
  reserveEntry      [n]byte    // reserved space, used to align current entry to 7 bytes.
  // according to valueTypeFlag:
  // if it's TYPE_ARRAY, TYPE_MAP, TYPE_LIST or TYPE_STRUCT,then value is uint16 type,specified son table's offset, relative to current table's end, block unit.
  // if it's TYPE_STRING, then value is uint16 type, specified the index of string in *StrFile*.
  // if it's number type below size of 32bit(int8-32, uint8-32, float32), then value is the real value of current entry.
  // if it's number type beyond size of 32bit, then value is uint32 type, specified the offset of real value, relative to start of exValues, byte unit.
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)