Several improvements and extensions can be considered:
1)     Increase accuracy by increasing the number of units in the hidden layer and the training data. 
2)     Extend the system to sentence-level recognition with speaker independence. 
3)     Inserting the temporal model for various speaking rates of a speaker.
