Foolhardy Science

A Practitioner from Taiwan

PureInsight | August 27, 2001

Even though I am a scientist by profession, I feel strongly that science is foolhardy. I remember what Archimedes said more than two thousand years ago, "Give me a lever long enough, and a prop strong enough. I can single-handedly move the world." As long as something can be discovered and tools can be found, science will make every effort to dig deeper.

My lab studies phonetic signal treatment and focuses on phonetic recognition. In short, phonetic recognition enables machines, including computers, to understand what you say. Except for the human world, living beings from other dimensions will think this research is funny and annoying. As mentioned in Zhuan Falun, 'In addition to human beings and animals, plants are also lives. Any matter’s life can manifest in other dimensions. When your Celestial Eye reaches the level of Fa Eyesight, you will find that rocks, walls, or anything can talk to you and greet you.” Since things can already understand our words, why do we try all different ways to invent a machine that will listen our speaking?

The procedure of phonetic recognition is as follows. First, we take a piece of the sounds of human speech as the sample, do a Fourier transformation, pass the signals through a wave filter which is meticulous in its design, and thus get a set of characteristic parameters. We then compare these parameters with a large volume of human phonetic data that have been stored in the database. A known pronunciation has already been attached to each piece of phonetic data in the database. So, after comparison, the pronunciation of the phonetic data with the closest match to the sample will be the result of the recognition.

There are several problems with our approach. First, the sampling procedure is the one that hugely complicates the signals. For example, with a 48kHz sampling frequency, a 5-second sound will be taken as a sample of 240,000 numbers. Only a computer can read and understand so many numbers. After we transform the sound into this numeric form that a human cannot grasp, we have to treat it with all kinds of filters to clear out the background noise, distinguish the speaker, build up the model of the human’s vocal cords, and so forth. We expend a lot of advanced mathematics to solve this problem. Even so, the solution we finally get is only a probability: this pronunciation, which has the highest probability, may be the solution. So, the result of the recognition is that the pronunciation, maybe, is this word. If we do phonetic recognition in a limited field, the rate of correctness is around 70 – 90 percent. What’s the limited field? It means that your topic is limited to a certain scope of vocabulary, for example, physical exercise. Once you go beyond this scope, for example, into politics, the rate of correctness of the recognition will drop steeply.


It would seem that the first step, sampling, is the beginning of the mistake; it turns a 5-second sound into 240,000 numbers. Since these numbers are too problematic to handle, we put the signals through a wave filter, do a Fourier transformation, and transfer the signals from a time domain to a frequency domain. Even so, the signals are still too complicated. What we can do? We will take the characteristic parameters and reduce the signals to 42 parameters that can be handled. Then we compare these parameters with the data stored in the database. Because too much information was lost during the treatments, the only thing we can do now is to calculate the probabilities: this pronunciation matches that sound with the highest probability while that pronunciation has a lower probability. The result of the calculation still doesn’t work very well so it is then massaged by a language-modeling program. Thus we, finally, enhance the recognition rate to an acceptable level.

Don’t say that this is the spirit of the Old Foolish Man (a legendary Chinese folklore figure who tried to remove the mountain in front of his home by carrying off one rock at a time). Actually, it should be described as foolhardy working. Why do we use such a foolish method to reach that goal? Because this method is the smartest one offered by science. Since science cannot provide a better solution, the only thing that scientist can do now is to keep working on it foolhardily with this unintelligent method.

A dog can instinctively understand whether his owner is happy, angry, or sad without the benefit of any training; Plants have the supernormal ability of mind reading. How come people go against nature and want an electronic crystal ball to “understand” humans’ words?

Translated from:
http://www.zhengjian.org/zj/articles/2001/8/12/11260.html

Add new comment