Acoustic Confusion MatricesMy first graduate school project involved the use of confusion matrices from psycholinguistics experiments. Finding them proved much harder than I expected, and I do not want anyone to go through the amount of work I had to go through getting them. Many people only know of the Miller-Nicely studies, which is unfortunate. Not because there's anything wrong with them, but because there is so much other stuff out there. On this site you will also find some of the actual confusion matrices in the original papers. I copied them row by row and checked them column by column so they should be accurate. However, if you are going to use them in a publication, I strongly encourage you to check the original sources first - please don't hold me responsible for any errors! This is stuff you can play with while you're waiting for your local interlibrary loan service to come through with the original papers. The presence of confusion matrices on this site undoubtedly breaks some copyright laws, but they will remain here until a journal complains. I think that the original authors would be quite happy to have them known more widely. However, if I haven't stated that I've contacted the original authors, I haven't. There is also well-documented Matlab code to help play with them. I would have written Perl or C++ code, except that I wanted to do plots with these, and Matlab provides a nice interactive environment for that sort of thing. There is also a list of references to papers relevant to confusion matrices, such as how to analyze them.
Papers and DataJASA refers to the Journal of the Acoustical Society of America. Once again, you are in all cases strongly encouraged to check the original papers if you ever want to use these matrices in a publication. Even if the matrices are copied correctly (all double checked, but who knows) you shouldn't use a matrix unless you know details of the conditions under which the experiment was done.
I have seen the following papers referred to as sources of confusion matrices.
DISCThe notation used in this site is DISC, which is a format used in the CELEX database. (A quick overview of CELEX files is given by Dirk Janssen.) Again, this is simply a function of the project I was working on at the time. The advantage of DISC is that it uses one character per phoneme (unlike say, ARPABET), which is convenient for programming. DISC characters are often obvious (eg p means /p/). The following exceptions for English consonants are most relevant for the matrices here: J means /ch/ (as in the first phoneme of the English word 'cheap'), _ means /dz/ (as in 'jeep'), Z means /zh/ (as in a middle phoneme of 'measure'), S means /sh/ (as in 'sheep'), D means /dh/ (as in 'thy'), T means /th/ (as in 'thigh'). Hugo Quene gives details of the full DISC set, amongst other phoneme transcription formats used by CELEX. Other Confusion Matrices resourcesUCL FIX (Scroll down to Feature Information Xfer) To quote part of their blurb: "a set of programs designed to facilitate analysis of confusion matrices by both ordinary and sequential information transfer analysis (SINFA - Wang & Bilger, 1973, JASA, 54[5] 1248-1266)..." Matlab CodeMatlab utilities. My other Matlab code often uses this stuff. So if you use any of my other Matlab code, put this in a place whose path is accessible by Matlab. PHONMAT (updated 1 May 2006) a MATLAB class I've found invaluable for analyzing confusion matrices, etc. There is some documentation. PHONVEC The 1-dimensional version of PHONMAT. Also invaluable for functional load analysis. LABELS - a MATLAB class requird by PHONMAT and PHONVEC. For some documentation, go to the bottom of that for PHONMAT. |