Tone Recognition in MandarinDinoj Surendran and Gina-Anne Levow and Yi XuPhase 1 : Studying the interaction of tone and focus on a clean, focus-marked, lab speech data setData : tonefocus_is05.mat
(zipped, 1.7Mb) : Matlab file with this data from Xu (1999)
analyzed in Surendran, Levow & Xu
(Proc. ICSLP/Eurospeech 2005), henceforth referred to as SLX'05. If you ever use it
call it... say... XuTF99.
Each of the 11520 data points is a Mandarin syllable in lab speech
that Xu collected for his 1999 JPhonetics paper from eight native
speakers (four male, one female).
The above is all you need to predict tone from pitch (within the
syllable anyway). But, we continue...
Tone Recognition using Focus (Surendran,
Levow, Xu), Proceedings of Eurospeech/ICSLP 2005
Partiview 3d model
(download, unzip, click, pray, see readme)
showing the perfomance of the baseline svm on tone recognition. Uses
the Parametric
Embedding (Iwata et al, 2004) algorithm from LIBSVM's output with
the "-b" flag (more documentation on this coming). Some pictures of
this are below. Click to expand.
Another 3d model: tonefocus_slx05.zip : has same data (4-th fold) but when different classifiers are created for three different focus condition groups (predicted by the confidence of tone classification). The attributes are "focuscond" with values 0 (no-focus), 1 (pre-focus), 2 (in-focus), 3 (post-focus) and "tone" with values 1,2,3,4. It was made with load tonefocus_is05.mat load splitbyconftonepred3LIN.mat for i=1:2880,pic4{i}=sprintf('syllz%d.sgi',test4{4}(i));end ndaona('publish','tonefocus_slx05', 'classprobs',FORFOLDS_lin.pe{4}, 'picdir','./images', 'pics',pic4,'attrib',[focus_npip(test4{4}) L(test4{4})],'attribnames',{'focuscond','tone'}, 'classes',L(test4{4}),'classnames',{'hi','rise','lo','fall'},'glyphsize',5); Some ResultsOne SVM for everythingRBF kernel: results_rbf_alltogether.mat. This was created with [pred,wts,nc,cm,acc,probest,optsused] = batchtest(X,L,train4,test4,opts);
opts =
doscale: 0
getweights: 0
doweight: 1.0000e-03
libsvmdir: '/export/d2/scratch/dinoj/libsvm-2.8' % modify this to be the location of your svm-train
nfolds: 5
kernelparam: 'findeach'
kerneltype: 2
libsvmflags: '-m 1000'
doprobest: 1
labels: [1 2 3 4]
The probability estimates by themselves can be found in probest_rbf.dat. It is a 11520 x 4 matrix with the (i,j)-th entry representing the probability (according to LIBSVM's -b calculation) that the i-th point belongs to class j. Linear kernel: results_lin_alltogether.mat. This was created with the same call as above, but with opts.kerneltype first set to 0. The probability estimates by themselves can be found in probest_lin.dat Predicting by groups of predicted focus (when focus known in training)Using tonefocus_is05_version2.mat Name Size Bytes Class L 11520x1 92160 double array X 11520x20 1843200 double array X_PI_ALL 11520x52 4792320 double array Ysent_foc 3840x1 30720 double array focus_npip 11520x1 92160 double array isfocused 11520x1 92160 double array sentence 11520x1 92160 double array spkr 11520x1 92160 double array test4 1x4 92400 cell array train4 1x4 276720 cell array Ysent_foc(j) is the focus condition of the j-th sentence/phrase. It equals n if the phrase has n-focus. n is between 0 and 3 inclusive. X_PI_ALL is, like X, a set of features for each syllable. Its columns are described below (and yes, its first 20 columns equal X).
These additional features were created using (initially Xintensity was what is now X_PI_ALL(:,21:40)) : X_pitch_features = createNBRfeatures(X,[1:3:11520]); X_intensity_features = createNBRfeatures(Xintensity,[1:3:11520]); X_PI_ALL = [X Xintensity X_pitch_features X_intensity_features]; isfocused is a binary vector; isfocused(j) is 1 iff the j-th syllable has focus OR if the j-th syllable is the final syllable in a 0-focus sentence. (This is so that 0-focus sentences are treated like 3-focus sentences.) It wass created using
isfocused=zeros(11520,1);
for i=1:3840,
a=3*(i-1);
if Ysent_foc(i),
isfocused(a+Ysent_foc(i))=1;
else
isfocused(a+3)=1;
end;
end
Linear SVM
opts2.labels=[0 1];
opts2.kerneltype = 0;
opts2.doweight = 0.001;
opts2.doscale = 0;
opts2.getweights = 0;
opts2.libsvmdir = '/home/dinoj/libsvm-2.8'; % change for your system
opts2.libsvmflags = '-m 1000';
opts2.doprobest = 1;
[pred,wts,nc,cm,acc,probest,optsused] = batchtest(X_PI_ALL,isfocused,train4,test4,opts2); % train focus predictor
pe_isfocused_lin_PI_ALL=zeros(11520,2);
for i=1:4,pe_isfocused_lin_PI_ALL(test4{i},:)=probest{i};end;
conf_focuspred_lin_PI_ALL = pe_isfocused_lin_PI_ALL(:,2); % i-th entry is confidence of focus predictor that i-th syllable is focused
[bestelemwise_predfocused_lin_PI_ALL,bestseqwise_predfocused_lin_PI_ALL]=choosebestscore(conf_focuspred_lin_PI_ALL,[1:3:11520],'last');
linconfpred_pip_PI_ALL = zeros(11520,1);
for i=1:3840,
b=bestseqwise_predfocused_lin_PI_ALL(i);
for j=1:b-1,
linconfpred_pip_PI_ALL(3*(i-1)+j)=1;
end;
linconfpred_pip_PI_ALL(3*(i-1)+b)=2;
for j=b+1:3,
linconfpred_pip_PI_ALL(3*(i-1)+j)=3;
end;
end;
[cm_syll_PI_ALL,nc_syll_PI_ALL]=getcm(focus_npip,linconfpred_pip_PI_ALL,[0:3]);
[cm_sent_PI_ALL,nc_sent_PI_ALL]=getcm(Ysent_foc,bestseqwise_predfocused_lin_PI_ALL,[0:3]);
[FORFOLDS_lin_PI_ALL,FORSPLITS_lin_PI_ALL,DETAILS_lin_PI_ALL,COMBINED_lin_PI_ALL] = splitbysomething (X,L,train4,test4,linconfpred_pip_PI_ALL,{1,2,3},opts);
The result of the above is saved in predfocus_lin_PI_ALL.mat. RBF Kernel
optsrbf.labels=[0 1];
optsrbf.kerneltype = 2;
optsrbf.kernelparam = 'findeach';
optsrbf.doweight = 0.001;
optsrbf.doscale = 0;
optsrbf.getweights = 0;
optsrbf.libsvmdir = '/export/d2/scratch/dinoj/libsvm-2.8'; % change for your system
optsrbf.libsvmflags = '-m 1000';
optsrbf.doprobest = 1;
optsrbf.nfolds = 5;
[pred,wts,nc,cm,acc,probest,optsused] = batchtest(X_PI_ALL,isfocused,train4,test4,optsrbf); % train focus predictor
save ~/html/projects/tonefocus/predfocus_rbf_PI_ALL.mat pred nc cm acc probest optsused
pe_isfocused_rbf_PI_ALL=zeros(11520,2);
for i=1:4,pe_isfocused_rbf_PI_ALL(test4{i},:)=probest{i};end;
conf_focuspred_rbf_PI_ALL = pe_isfocused_rbf_PI_ALL(:,2); % i-th entry is confidence of focus predictor that i-th syllable is focused
[bestelemwise_predfocused_rbf_PI_ALL,bestseqwise_predfocused_rbf_PI_ALL]=choosebestscore(conf_focuspred_rbf_PI_ALL,[1:3:11520],'last');
rbfconfpred_pip_PI_ALL = zeros(11520,1);
for i=1:3840,
b=bestseqwise_predfocused_rbf_PI_ALL(i);
for j=1:b-1,
rbfconfpred_pip_PI_ALL(3*(i-1)+j)=1;
end;
rbfconfpred_pip_PI_ALL(3*(i-1)+b)=2;
for j=b+1:3,
rbfconfpred_pip_PI_ALL(3*(i-1)+j)=3;
end;
end;
[cm_syll_PI_ALL,nc_syll_PI_ALL]=getcm(focus_npip,rbfconfpred_pip_PI_ALL,[0:3]);
[cm_sent_PI_ALL,nc_sent_PI_ALL]=getcm(Ysent_foc,bestseqwise_predfocused_rbf_PI_ALL,[0:3]);
optsrbf.labels=[1:4];
save ~/html/projects/tonefocus/predfocus_rbf_PI_ALL.mat pred nc cm acc probest optsused *rbf*PI_ALL
[FORFOLDS_rbf_PI_ALL,FORSPLITS_rbf_PI_ALL,DETAILS_rbf_PI_ALL,COMBINED_rbf_PI_ALL] = splitbysomething (X,L,train4,test4,rbfconfpred_pip_PI_ALL,{1,2,3},optsrbf);
save ~/html/projects/tonefocus/predfocus_rbf_PI_ALL.mat pred nc cm acc probest optsused *rbf*PI_ALL
The result of the above is saved in predfocus_rbf_PI_ALL.mat. Predicting by Confidence-predicted focus (when focus not known during training)Results of running the below lines are in splitbyconftonepred3LIN.mat.
opts.kerneltype = 0;
opts.labels = [1 2 3 4];
opts.doweight = 0.001;
opts.doscale = 0;
opts.getweights = 0;
opts.libsvmdir = '/home/dinoj/libsvm-2.8'; % change for your system
opts.nfolds = 5;
opts.libsvmflags = '-m 1000';
opts.doprobest = 1;
PElin = load('probest_lin.dat') ;
highestconf_lin = max(PElin');
[bestelemwise,bestseqwise] = choosebestscore(highestconf_lin,[1:3:11520],'last');
linconfpred_pip=zeros(11520,1);
for i=1:3840,
b=bestseqwise(i);
for j=1:b-1,
linconfpred_pip(3*(i-1)+j)=1;
end;
linconfpred_pip(3*(i-1)+b)=2;
for j=b+1:3,
linconfpred_pip(3*(i-1)+j)=3;
end;
end;
[FORFOLDS_lin,FORSPLITS_lin,DETAILS_lin,COMBINED_lin] = splitbysomething (X,L,train4,test4,linconfpred_pip,{1,2,3},opts);
Results of running the below lines are in splitbyconftonepred3RBF.mat.
opts.kerneltype = 2;
opts.kernelparam = 'findeach'; % other opts as before
PErbf = load('probest_rbf.dat');
highestconf_rbf = max(PErbf');
[bestelemwise,bestseqwise] = choosebestscore(highestconf_rbf,[1:3:11520],'last');
rbfconfpred_pip=zeros(11520,1);
for i=1:3840,
b=bestseqwise(i);
for j=1:b-1
rbfconfpred_pip(3*(i-1)+j)=1;
end;
rbfconfpred_pip(3*(i-1)+b)=2;
for j=b+1:3
rbfconfpred_pip(3*(i-1)+j)=3;
end;
end;
[FORFOLDS_rbf,FORSPLITS_rbf,DETAILS_rbf,COMBINED_rbf] = splitbysomething (X,L,train4,test4,rbfconfpred_pip,{1,2,3},opts);
|