Sequence ClassificationData formatWe always assume that we are dealing with HMM-type data, i.e. there is a one-to-one correspondence between observations and labels. Thus we can speak of Sequential (or Seq11) and Flat format. Sequential : Suppose you have a collection of N sequences, with the K-th sequence having NK elements, each element being a D-dimensional vector. Each vector is labelled with an integer. Flat : this is just a collection of N=sumk Nk vectors with the same number of labels. Matlab formatSequential data: The Matlab struct X has two fields
X.data
X.labels
X.origlabels
X.comments
X.origlabels is a vector of the integers used as class labels. The labels in X.labels{:} are always from 1 to L=length(X.origlabels) but the labels the user supplies may be different. This keeps track of them, so a label k in X.data actually refers to X.origlabels(k). X.comments is empty or a cell array of comments. X.data is a cell array, with X.data{i} having a
Ni x D (sparse or dense) matrix representing the
Ni vectors in the i-th sequence.
File Format for sequential dataThis has the following options (see examples below) Tag/Space-separated. In the first, sequences are separated by blank lines. In the second, they are separated by <sequence length=..&rt;...</sequence&rt; tags Sparse/Dense. In the first, data is stored as index-value pairs. In the second, all D values are stored. DimHeader (Present or absent) If present, the line <dimension=D> appears on the first line. LabHeader (Present or absent) If present, the line <labels=L1 L2 ... Ln> appears after (if any) the dimension header. Examples : the data below shows a collection of two sequences with 3 and 4 2-dimensional vectors respectively. Each of the 7 vectors in the collection is labelled -1 or 1. Tag-separated, dense, DimHeader, LabHeader <dimension=2> <labels -1 1> <sequence length=3> 1 0.335 0.312 -1 0.121 -0.112 1 0.954 0 </sequence length> <sequence length=4> -1 -0.45 -0.1 -1 -1.21 0.24 1 0.00 4.21 -1 -0.12 -1.2141 </sequence> Tag-separated, dense, no headers <sequence length=3> 1 0.335 0.312 -1 0.121 -0.112 1 0.954 0 </sequence length> <sequence length=4> -1 -0.45 -0.1 -1 -1.21 0.24 1 0.00 4.21 -1 -0.12 -1.2141 </sequence> Space-separated, dense, LabHeader <labels -1 1> 1 0.335 0.312 -1 0.121 -0.112 1 0.954 0 -1 -0.45 -0.1 -1 -1.21 0.24 1 0.00 4.21 -1 -0.12 -1.2141 Space-separated, sparse, Dimension header <dimension=2> 1 1:0.335 2:0.312 -1 1:0.121 2:-0.112 1 1:0.954 -1 1:-0.45 2:-0.1 -1 1:-1.21 2:0.24 1 2:4.21 -1 1:-0.12 2:-1.2141 Space-separated, sparse 1 1:0.335 2:0.312 -1 1:0.121 2:-0.112 1 1:0.954 -1 1:-0.45 2:-0.1 -1 1:-1.21 2:0.24 1 2:4.21 -1 1:-0.12 2:-1.2141 Filesseqread.m. Reads from a text file in sequential or flat format. seqwrite.m. Writes to a text file in sequential or flat format. |