VRANNIS

Founds some documentation from the old VRANNIS project we worked on back in Spring of 1999. VRANNIS is a voice recognition system that works via prototyping users.

VRANNIS

Voice Recognition

Artificial Neural Network

Identification System (VRANNIS)


 

Prepared for:

Dr. Michael Stiber

CSSIE-490, Neural Networks

U of Washington, Bothell

 

Submitted by:

 

 

Botello, Drake R.

dbotello@u.washington.edu

Nguyen, Hoai P.

nguyener@u.washington.edu

Khoat, Do V.

Kdo@u.washington.edu

Graupmann, Timothy A.

tgraupma@u.washington.edu

 

August 20, 1998

Abstract

The Voice Recognition Artificial Neural Network (VRANNIS) presented herein is a “speaker identification” Artificial Neural Network (ANN). As such, VRANNIS®
should not be confused with “word recognition” neural networks, which are inherently far more complex and capable of commercial applications.

VRANNIS outputs the personal identity of a speaker’s voice provided that the speaker’s input vector is a member of the prototype vector population. Classification is achieved by comparing the “power spectrum” signature points belonging to the input vector, against the “power spectrum” signature points belonging to the prototype vector population. In these regards, VRANNIS functions as an artificially intelligent human ear. Capable of accurately identifying voices that it is associatively familiar with. Additional time and work would extend the functionality of VRANNIS to encompass reliable identification of non-prototype users.

VRANNIS utilizes a Pentium®
, IBM®
compatible PC, an external microphone and Visual C++®
/MATLAB®
programming tools. The neural network consists of a single perceptron. VRANNIS can classify its four prototype vectors with the following accuracy PV1=100%, PV2=76%, PV3=83% and PV4=100%.

Application Area

Automatic Speech Recognition (ASR) can be broken into three categories: speaker-dependent, speaker-independent and speaker-adaptive. VRAAN is emblematic of a speaker-dependent ASR, as it is trained to recognize one of four specific speakers. As such, VRANNIS has limited commercial appeal beyond that of a toy.

However, achieving consistent and accurate results with in the scope of VRANNIS’s application required a relatively high level understanding of human speech characteristics, signal processing and analysis techniques, principal component analysis, Visual C++®
and MATLAB®
programming ability and neural network architecture knowledge inter alia.

The goal or problem space of VRANNIS entailed developing a neural network to mimic the functionality of the human ear; and approximate the performance of the human ear in regards to reliably identifying the owner of a voice. A human ear can easily identify a familiar or unfamiliar voice by the process of auditory association; which in essence is a biological ASR neural network. The human ear is also capable of reliably identifying a given speaker’s voice in “delta” situations including: rates of speech, manner of speech and background noise. VRANNIS’s accuracy suffers under similar situations.

 

User Examples, Exercises and Functionality

VRANNIS program files, located in the accompanying diskette, must first be unzipped and loaded into a “temporary VRANNIS program directory” in MATLAB®
– before proceeding with any of the user examples or exercises. The authors highly recommend scanning the diskette with an anti-virus program utility capable of scanning “unopened” .zip files.

A user can demonstrate VRANNIS’s functionality and efficiency in one of three ways:

  1. For convenience, VRANNIS offers a built in user presentation located within its program file library. The intention of the presentation is to familiarize the user with most of the significant aspects of VRANNIS functionality and accuracy. The developers of VRANNIS suggest viewing this presentation before attempting to conduct user demonstrations number 2, or number 3 below. To view the presentation follow the instructions below, keeping in mind that commands are case sensitive:
  2. a. EDU>> present

    b. Strike any key to scroll forward through the entire presentation

  3. A user can retrieve and load any one of 116 stored prototype .wav’ files to test the functionality and classification accuracy of VRANNIS. Each of the files are numbered and named. For example, (load_sample (‘tim1.wav’)) refers to an individual prototype vector that belongs to Tim Graupmann. Twenty-nine individual voice files, numbered 1:29, exist separately for ‘tim, ‘drake, ‘hoai and ‘khoat. To achieve this aim follow the instructions below:
  4. a. variable=load_sample(‘drake1.wav’);

    b. Substitute ‘drake(1:29).wav’ to load additional ‘drake prototype vectors

    c. Substitute ‘(hoai, khoat, or tim)1.wav’ to load additional member prototype vectors

  5. A user can connect an external microphone to a PC, or use the PC’s built in microphone (if it has one). The user would pronounce “Hello computer” twenty-nine times, and store each utterance as one of the named protoype vectors in VRANNIS. Bear in the mind that the user has 8001 milliseconds to complete the utterance. Following, VRANNIS will identify the user’s voiceprint sample as a member of the prototype population. To achieve this goal follow the instructions below:

    1. gsave_sample(‘drake1.wav’);
    2. Traverse 1 through 29 repeating the words, “hello computer”
    3. Be sure to use the name drake, hoai, khoat, or tim for all prototype samples

 

VRANNIS Signal Processing and Design Criteria

Initial Signal Processing

A voiceprint was digitally recorded twenty-nine individual times for each of the four individuals belonging to the prototype population (P11:29Â…P41:29). Subsequently, each voiceprint was converted from amplitude over time – to power (p) over frequency by application of the Fourier Transform.

Following, the power spectrum (total number of p-values) belonging to P11:29Â…P41:29 was reduced from 4,000 p-values to 500 p-values, by application of the fft function (See Figure 1, page 9).

Signal processing achieved three (3) important VRANNIS milestones as follows:

  • The digital voice signal was successfully converted from an amplitude domain to a power spectrum domain, hereafter referred to as the p-spectrum.

  • The total number of p-values in the p-spectrum was reduced from 4,000 to 500, which produced notably improved wave pattern features, as illustrated in Figure 1.

  • P11:29Â…P41:29 voiceprint samples, each spanning a p-spectrum of 500 individually indexed p-values, became “principal” prototype voiceprints.

Principal Signal Component Processing

Principal voiceprints P11:29Â…P41:29 varied dramatically from one another – as one would expect. Unexpectedly, the individual voiceprints belonging to P11:29, P21:29, P31:29, and P41:29 also varied substantially amongst themselves. Thus the p-values in each of 1:29 voiceprints belonging to P1Â…P4 required “normalization” before being construction into reliable prototype vectors for (PV1Â…PV4) – representative of each 1:29 voice prints belonging to P1Â…P4.

Normalization was achieved by dividing each p-spectrum spanning 500 p-values for PV1Â…PV4 into fifty discrete sampling windows. Each sampling window contains ten individually indexed p-values. A hypothetical sampling window is shown in Figure 2 below, containing ten indexed p-values:

 

Algorithms search each sampling window (50×29) belonging to each voiceprint for P1, P2, P3 and P4 and returns three different values for each sampling window. The mean of each p-value, the mode of the maximum p-value and then calculates the mean of the (mode + the maximum indexed p-value).

Referring to Figure 3, The mean represents the values of all ten p-values shown as vertical lines. The vertical line intersecting the bold horizontal line represents the mode of the maximum indexed p-value. Thus the mode of the max indexed p-value is identified, the mean of the max indexed p-value is computed to a single value; and the mean of the mode and the mean is computed as the final value for a single sampling window. The equation is written as:

mean{max(p-value index), mode(max(p-value index))}. (4.0)

Principal signal component processing achieved two (2) additional design milestones as follows:

  • The 29 individual voice waves belonging to P1, P2, P3 and P4 were each compiled to one principal indexed vector voiceprint (50X1) referred to as: PV1, PV2, PV3 and PV4. Note that the single vector voiceprint(s) is a function of the significant p-spectrum properties formerly contained in all 29-voiceprints belonging to P1, P2, P3 and P4.

  • The fifty resulting indexed p-values for each of PV1Â…PV4 represent highly reliable data to train VRANNIS (for PV1, PV2, PV3 and PV4).

 

Interim Vector Component Processing

As an input vector passes through each of the signal processing stages discussed above it’s p-indexed wave form navigates through each of the indexed p-values belonging to PV1, PV2, PV3 and PV4. This dynamic relationship is illustrated in Figure 4, page 10.

Following, an algorithm measures the distance from each indexed p-value belonging to the input vector to each of the indexed p-values belonging to PV1, PV2, PV3 and PV4 . The algorithm returns the prototype vector (PV1, PV2, PV3 or PV4) that is closest in distance to the input vector at each of the fifty p-indexed sampling windows (note that the absolute value compensates for negative distances). In these regards, this vector component processing methodology borrows (partially) from the “distance logic” found in Hamming neural networks.

The algorithm counts the number of occurrences (or hits) that are coincident with the input vector and PV1, PV2, PV3 or PV4. A hit is regarded as a 1 where as a miss is regarded as a -1, which gives rise to four vectors each containing fifty elements. This information is input into a single layer perceptron for final classification.

 

 

VRANNIS Test Results

 

NAME

Drake 1:29

Hoai 1:29

Khoat 1:29 — 3

Tim 1:29

ACCURACY

100%

76%

83%

100%

Alternative Approaches

VRANNIS could be implemented using a Hamming neural network, which would conceivably make VRANNIS substantially more powerful. Particularly for identifying both prototype and non prototype users. The addition of low pass filter is viewed as secondary to furthering the accuracy of a Hamming neural network. That is, we would first like to develop a Hamming network and test the results before implementing filtering techniques.

 

Limitations

VRANNIS suffers from the inability to identify non-prototype users as noted above. In addition, VRANNIS’s classification reliability suffers variably if the sampling session:

  • Is not conducted in exactly the same room with the same microphone.
  • Is conducted in the same room, but with a different microphone.
  • Is conducted in a different room, but with the same microphone.

We have not isolated the cause and affect from the foregoing variable circumstances.

If afforded additional time, we believe that energies devoted to replacing VRANNIS’s architecture with a Hamming neural network would most likely eliminate or mitigate the apparent “sensitivities” associated with the current version of VRANNIS.

 

Problems Encountered

As laypersons, new to neural networks and voice signal processing, we encountered several challenges both individually and collectively. However, we regard these instances as applied learning opportunities – which is why we elected to develop a project rather than write a research paper. The new skills and awareness we have “earned” encompasses abilities that we did not fully possess before embarking.

Setting philosophical meandering aside we faced the following problems:

  • We were unable to successfully implement the recurrent layer in the Hamming Network, despite very promising results in the feed-forward layer. Never the less, the code still exists in a VRANNIS directory. Following a half-day of unsuccessful attempts, we were left with little time to fully investigate this failing.

  • We were unable to employ supervised learning to establish the proper weight matrix and bias values for a perceptron. The dimensions of VRANNIS’s four prototype vectors are 50X4. Starting with arbitrary values, MATLAB did not arrive at the proper values following 3,000 iterations. We do not know if a solution simply doesn’t exist, or additional iterations were necessary.

Thus we employed matrix and bias values that returned any of the four prototype vectors. However, we had to develop a technique to calculate the distances from the input vector and the four prototype vectors in such a way that -1 or 1 would be returned. The “distance technique” we employed approximates the logic of Hamming, which is one of the reasons we “believe” a Hamming network is ideal for VRANNIS.

  • We found it very difficult to find research information suited for novices, who were conducting their first attempt with voice related ANN’s. In this regard we are grateful for the mentoring we received from Dr. Stiber and Dr. Jackels; and the precious few pages of information we found in books – listed under references.

  • Communications between group members was challenging and hilarious. It is difficult for Hoai and Khoat to understand many American colloquialisms. Similarly, Tim and Drake had difficulty understanding Vietnamese influenced rates of speech and pronunciations.

Curiously, VRANNIS had little difficulty identifying the voice of anyone of us!

With respect to the “problems” listed herein, a few of us would like to meet with you late next week (or the next) to discuss any insights you may wish to share.

 

Future Work

While we seemingly have placed the Hamming Net on a pedestal, there are several avenues within the context of VRANNIS, as it presently exists, that seem worthy of investigating. For example:

  • Increase the number of recorded wave files belonging to each prototype member from 29 to 50. Then 50 to 100. Compare test results each time. No changes would be made to the total p-value spectrum of 500 per prototype member.

  • Change the total p-value spectrum from 500 to 750; and increase the number of indexed sampling windows from 50 to 150. Compare test results. No changes would be made to the total number of recorded wave files belonging to each prototype member (1:29).

  • Change the input voice wave “hello computer” to a single syllable word; and make the changes, in stages followed by testing, suggested above.

In essence, we would first attempt to discover if the accuracy of VRANNIS could be improved for externally voiced utterances – by simply increasing the number of recorded voice wave samples. Secondarily, we would increase the p-spectrum field. Finally, we would greatly increase the number of indexed p-value sampling windows.

All test results would be recorded, analyzed and documented. In this regard, “future work” constitutes a “research project” that Tim Graupman, Drake Botello and possibly others would be willing to undertake in the future. Either personally, or under a directed study.

Having completed all of the proposed research, we would next focus our efforts on developing a Hamming Neural Network for VRANNIS. Test results would be compared to each of the other successive methodologies.

 

 

 

 

 

 

 

 

 

 

References

 

 

Hagen, Martin T., and Howard B. Demuth, and Mark Beale, Neural Network Design (Boston, MA: PWS Publishing Company, 1996). 3.12-3.13, 4.13-4.20, 11.1-11.23

 

Hanselman, Duane, and Bruce Littlefield, The Student Edition of MATLAB, Version 5 User’s Guide (Upper Saddle River, NJ: Prentece-Hall, Inc., 1997). n.p

 

Hanselman, Duane, and Bruce Littlefield, The Student Edition of MATLAB, Version 5 User’s Guide, Online. Internet. Available: http://www2.rrz.une/themen/cmp.cal-tech.edu/matlab

, n.p. 10 July 1998.

 

Looney, Carl G., Pattern Recognition Using Neural Networks: Theory and Algorithms for Engineers and Scientists (New York, NY: Oxford Press Inc., 1997). 80-81, 434-439.

 

Carpenter, Gail A., and Stephen Grossberg, Pattern Recognition by Self-Organizing Neural Networks: (Cambridge, MA and London, England: The MIT Press 1991). 458.

Danset, Paul T., Speech Recognition Using Neural Networks: Master’s Thesis (Seattle, WA: University of Washington., 1993). 4.

Lea, et al, Trends in Speech Recognition (Englewood Cliffs, NJ: Prentice-Hall Inc., 1980). 10, 40-43, 108.

Dowla, Farid U, and Leah L. Rogers, Solving Problems in Environmental Engineering and Geosciences (Cambridge, MA: The MIT Press 1991). 104.

 

 

 

 

 

 

 

 

Appendix

 

VRANNIS test resultsÂ…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…..Page 11-14

 

 

 

 

 

 

72 Hour GDC

The next 72 Hour Game Development Competition is coming June 25, 2004. Keep checking the [GDC Board] for updates. Get prepared!
[RULES]

Theme suggestions can be given immediately. On June 11, when I announce it, stage 1 voting will begin. On June 18, stage 2 voting will begin. On June 25, at exactly 12:01 PM EST (that’s right after noon), the results of the voting will be announced and the competition will begin. On June 28, at exactly 12:01 PM EST, the competition will be over and no more entries will be accepted.

June 2004 Code Fest

Judges have been selected and the topic has been chosen. This time contestants will be making sword-based combat games! Some part of the game must include…

Visit [GameDev] for more information.
Judges have been selected and the topic has been chosen. This time contestants will be making sword-based combat games! Some part of the game must include using a sword to engage in combat. While it may sound a little limiting in the creativity department, we encourage the contestants to push it as far as it will go. We’re all looking forward to a very fun competition. Good luck!

Visit [GameDev] for more information.

TagML (design ideas)

Conversation from an AIM message between me and Flax.
tgraupmann648: I had a wicked idea before I fell into deep sleep last night
Flax0000: Oh yer?
Flax0000: What was it?
tgraupmann648: I was thinking about connecting the model editor to a mysql database
tgraupmann648: I have this concept of a virtual workspace
tgraupmann648: So a person has complete control over their workspace
tgraupmann648: But you can flip through other users using the tool to watch what they are making
tgraupmann648: Similar to Linux how you can flip the TTY or workspaces
tgraupmann648: Eventually it could evolve into collaborative modeling
Flax0000: cool
tgraupmann648: On the technical side, I would just need to convert from C++ to C# and add a web reference
tgraupmann648: The web reference could connect to my site using web services that talk to a mysql database
tgraupmann648: I’ll save the idea, short term I’ve added the ability to zoom by scrolling with the mouse

TagML (Intentional Axis Locking)

TagML is a new 3d model editor / animator. The [source] and [build] are publically available. The latest feature adds the ability to lock an arbitrary axis. As requested, this will give you better precision while animating models. You can also lock by the strafe vector (side to side) and up vector (up and down) relative to the viewports. Or you can lock by axis. Oh, and you can zoom by scrolling the mouse.

ZBrush (Update)

I purchased a license for ZBrush 2.0. It’s just like they say, digital putty. I’m going through every tutorial and that doesn’t even scratch the surface of what this tool can do. Here is my first attempt at digital sculpting using the Sphere3D method. Like so many other posts at ZBrush Central… “My First Head”…

TagML (latest)

TagML is a new 3d model editor / animator. I have made the [source] and [build] publically available. The new feature that I just added is a better keyframe display bar. Hopefully this looks like a hybrid style of PhotoShop and Flash. If it’s not intuitive let me know. As always, suggestions are welcome.
Bones (C) The Game Creators Ltd.
Bones (C) The Game Creators Ltd.

Progress Update (TagML)

I recently created a panel base object which the toolbars inherit to allow the use of textures in the toolbar buttons themselves. I moved the keyframe track display into its own panel. I still have the intent that the keyframe display will also be textured. I just finished the code that will allow the animation bar buttons to use textures. And I spent a little time drawing the icons to match the shaded background.
Android (C) The Game Creators Ltd.
Android (C) The Game Creators Ltd.

HowTO IIS

When you start ASPX and ASMX programming, the default IIS setings are commonly messed up. The solution is not well documented. The configuration process requires this command in order to work properly.

This properly configures IIS:
aspnet_regiis.exe -i

aspnet_regiis can normally be found:
C:/WINDOWS/Microsoft.NET/Framework/v1.1.4322

These are common commands for IIS:
iisreset /start
iisreset /status
iisreset /stop

Progress Update (TagML)

You fans out there have been inquiring what I’ve been doing the last couple months since my last post. I’ve been quite busy. During the daylight hours, I’ve been testing the next generation bidding system at INSP. And in my night hours, I’ve joined FellStorm Software, a game development group, working on an action RPG. As you can see glimpsing at the pic to the left, I’ve designed a model editor from scratch, capable of importing 3ds files or creating animated models. Basically it’s an animation tool which saves models in a new Tagml format. The tool is unique in that it will take models straight from 3d studio. The internal structure is all vector based, so it’s portable to OpenGL or DirectX.
Alien Hivebrain (C) The Game Creators Ltd.
Alien Hivebrain (C) The Game Creators Ltd.

HowTo Check the Image Format in C#

This example C# script is a universal example for how to extract image data from just a byte array. You can extract file type and dimensions, easily.

//check image dimensions 
System.Drawing.Image logoImage = System.Drawing.Image.FromStream(new MemoryStream(imageData)); 
if(logoImage.PhysicalDimension.Height > 100 || logoImage.PhysicalDimension.Width > 80) 
        throw m_ef.CreateException("Incorrect Image Dimensions!");

ImageType imageType; 
//determine file type 
if(logoImage.RawFormat.Equals(System.Drawing.Imaging.ImageFormat.Jpeg)) 
        imageType = ImageType.JPG; 
else if(logoImage.RawFormat.Equals(System.Drawing.Imaging.ImageFormat.Gif)) 
        imageType = ImageType.GIF;

HowTo Setup A MySQL Database

This article shows you how to connect your Perl/PHP scripts to a MySQL database.

BASIC OVERVIEW

    Start by skimming the mysql site at MySQL.com.

    The download section has medium and max grade versions of the my sql database. Free to download.

    You might even want to get a copy of "MySQL Control Center" which let's you administer that database and users with a handy GUI.

    Under Contributed APIs, there is a link to "DBI" which let's you connect to MySQL with Perl. There are other drivers down there to connect to whatever platform you want.

    Indigoperl is a good installation that comes with Apache and ModPerl. It's available at indigostar.com

    It doesn't hurt to get yourself a copy of activeperl from activestate.com.

    Afer installing activeperl, just bring up a command prompt and type: "ppm install DBI". That will install all the DBI perl related drivers for you. "perldoc DBI" at the command prompt will give you a lot of information about how to use it.

    "ppm install DBD::mysql" is supposed to work, but has a conflict. So you need to use: "ppm install http://theoryx5.uwinnipeg.ca/ppms/DBD-mysql.ppd" which installs the mysql wrapper that you need.

    You have to be in ./indigoperl/perl/bin and run the command "ipm install DBI". That will install the drivers for indigoperl/apache.

    Similarly to install the mysql wrapper for indigoperl/apache type: "ipm install http://theoryx5.uwinnipeg.ca/ppms/DBD-mysql.ppd".

    Use the admin tool to create user "db_user" with password "password" granting permissions to at least do queries on tables.

    You may need to type mysqld.exe to startup the mysql database. Bring up the client using mysql.exe. Paste the following to create a table to work with.

    
    create database qafw01;
    
    use qafw01;
    
    drop table TestCase;
    drop table TestExecution;
    
    /*
     ....
     */
    create table TestCase
    (
    	TestCaseID   INT AUTO_INCREMENT PRIMARY KEY,
    	Description  TEXT,
            RunInterval  INT,
    	QueueWebUrl  TEXT,
    	AppName      TEXT,
    	CommandLine  TEXT,
            TestType     TEXT,
            ScriptOutput TEXT,
            LogType      TEXT,
            CaseStatus   TEXT
    );
    
    /*
     ....
     */
    create table TestExecution
    (
    	TestExecutionID   INT AUTO_INCREMENT PRIMARY KEY,
    	TestCaseID        INT,
            LastTimeRun       TEXT,
            Result            TEXT,
            ResultDescription TEXT
    );
    
    -- Setup the following contraints
    
            alter table TestExecution
            add foreign key(TestCaseID) 
            references TestCase (TestCaseID);
    
    -- Create some dummy data to query.
    
    INSERT INTO TestCase
    (Description, RunInterval, QueueWebUrl, AppName,
    CommandLine, TestType, ScriptOutput, LogType,
    CaseStatus)
    VALUES ('Verify...', '3600', 'http://...',
    'Search O&O', 'idptests00tc0001.pl',
    'Perl', 'PIPE', '|', 'A');
    

    Here's an example of a Perl script that works for me in indigoperl. Just put it in ./indigoperl/apache/cgi-bin/ .

    
    #!perl
    
    # In DBD there is "No close statement".
    #
    # Whenever the scalar that holds a database or statement handle
    # loses its value, Msql chooses the appropriate action (frees the
    # result or closes the database connection). So if you want to free
    # the result or close the connection, choose to do one of the following:
    #
    #         undef the handle 
    #
    #         use the handle for another purpose
    #
    #         let the handle run out of scope 
    #
    #         exit the program.
    
    use strict;
    
    #To connect to the mysql database
    use Mysql;
    
    
    #FOR HTTP IO
    use CGI;
    
    #autoflush
    $| = 1;
    
    #HTML FORMAT OUTPUT VARIABLE
    my $co = new CGI;
    
    #START HTML PAGE
    print $co->header;
    
    my $host     = "localhost";
    my $database = "qafw01";
    my $user     = "db_user";
    my $password = "password";
    
    # Connect to database
    my $dbh = Mysql->connect($host, $database, $user, $password);
    
    # Do a query
    my $sql_statement = "SELECT * FROM TestCase LIMIT 1;";
    my $sth = $dbh->query($sql_statement);
    
    # Fetch the result
    my %result = $sth->fetchhash;
    
    # Print the output
    foreach my $key (keys %result)
    {
       print "$key: $result{$key}" . $co->br . "n";
    }
    
    exit;