半路出家做编程,不做又不行,前天的一个c++函数指针的问题,现在终于是编译通过了,但是究竟是怎么回事还是很懵懂,真是为伊消得人憔悴,衣带渐宽终不悔。办公室几个研一的和那个大四的都觉得编程是软件和计算机的事——很久之前交代给B的语音合成的任务到现在音信杳无,也不想多去催,没名没分的,又不是老师——连老师都不管呢。
// 上面几行都是昨天写的,昨天睡晚了,今天就觉得有点懒散。
现在也没有精力系统的记一些编程和看书的笔记,写博客其实也很力不从心了,看看最近一年的运动就知道这前后一年真是有生最失败的一年了,而且这种状况还要继续两三个月才可能有转机。
忙着,只能记点散的,没有劲头在写完整的笔记。espeak 的源码所有的变量都没有注释,看起来真痛苦。有一个概念是espeak_data,这是什么数据?有个data_path会提供espeak_data,data_path其实是char*,实际用的时候应该是某文件的地址吧。
/*******************原文引用*************************/
static const char *help_text =
“nespeak [options] [“<words>”]nn”
“-f <text file> Text file to speakn”
“–stdin Read text input from stdin instead of a filenn”
“If neither -f nor –stdin, then <words> are spoken, or if none then textn”
“is spoken from stdin, each line separately.nn”
“-a <integer>n”
“t Amplitude, 0 to 200, default is 100n”
“-g <integer>n”
“t Word gap. Pause between words, units of 10mS at the default speedn”
“-k <integer>n”
“t Indicate capital letters with: 1=sound, 2=the word “capitals”,n”
“t higher values indicate a pitch increase (try -k20).n”
“-l <integer>n”
“t Line length. If not zero (which is the default), considern”
“t lines less than this length as end-of-clausen”
“-p <integer>n”
“t Pitch adjustment, 0 to 99, default is 50n”
“-s <integer>n”
“t Speed in words per minute, 80 to 450, default is 175n”
“-v <voice name>n”
“t Use voice file of this name from espeak-data/voicesn”
“-w <wave file name>n”
“t Write speech to this WAV file, rather than speaking it directlyn”
“-bt Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit n”
“-mt Interpret SSML markup, and ignore other < > tagsn”
“-qt Quiet, don’t produce any speech (may be useful with -x)n”
“-xt Write phoneme mnemonics to stdoutn”
“-Xt Write phonemes mnemonics and translation trace to stdoutn”
“-zt No final sentence pause at the end of the textn”
“–compile=<voice name>n”
“t Compile pronunciation rules and dictionary from the currentn”
“t directory. <voice name> specifies the languagen”
“–ipa Write phonemes to stdout using International Phonetic Alphabetn”
“–path=”<path>”n”
“t Specifies the directory containing the espeak-data directoryn”
“–pho Write mbrola phoneme data (.pho) to stdout or to the file in –phonoutn”
“–phonout=”<filename>”n”
“t Write phoneme output from -x -X –ipa and –pho to this filen”
“–punct=”<characters>”n”
“t Speak the names of punctuation characters during speaking. Ifn”
“t =<characters> is omitted, all punctuation is spoken.n”
“–split=”<minutes>”n”
“t Starts a new WAV file every <minutes>. Used with -wn”
“–stdout Write speech output to stdoutn”
“–voices=<language>n”
“t List the available voices for the specified language.n”
“t If <language> is omitted, then list all voices.n”;
/***********************原文引用结束****************************/
这个
–voice=<language>倒确实是语言选择,-vzh, -vde, -ven都尝试了一遍,效果还算是满意。
-a 用来调音量
这两个命令在实际运行的时候都用过,但是源码还没有看透,有点着急,于是想着用GDB调试来看看,结果一run就来这个:
starting programm: *****
[Thread debugging using libthread_db enabled]
[New Thread 0x…… (LWP 5727)]
然后,然后就没有任何动静了,不接受任何GDB的指令,也无法看到这个程序本身对命令的响应,除了Ctrl+C可以让它终止。幸好Zlike正好收到SOS,说命令行调试有专门的入口,上网查了查,原来只要:
run argv1 argv2 …
就行了,蛋都要碎了……
琢磨了两天的espeak源程序,我讨厌没有注释的代码,通过gdb找到语音合成输出应该是这个espeak_Synth(),按照目前的理解,语音合成只需要最基本的两步:Initialize, Synthesize,前面的Initialize已经通过了,唯独加上了这个Synth()就出现段错误,应该是参数给的不对.
espeak_Synth(text,size,0,POS_CHARACTER,0,synth_flags,NULL,NULL);
文档里对这个合成函数是这么说的
/****************espeak_Synth 原文引用****************/
ESPEAK_API espeak_ERROR espeak_Synth(const void *text,
size_t size,
unsigned int position,
espeak_POSITION_TYPE position_type,
unsigned int end_position,
unsigned int flags,
unsigned int* unique_identifier,
void* user_data);
/* Synthesize speech for the specified text. The speech sound data is passed to the calling program in buffers by means of the callback function specified by espeak_SetSynthCallback(). The command is asynchronous: it is internally buffered and returns as soon as possible. If espeak_Initialize was previously called with AUDIO_OUTPUT_PLAYBACK as argument, the sound data are played by eSpeak. */
/****************espeak_Synth 引用结束****************/
编程最怕的就是访问越界,尤其是我这样半路出家又不想专职码代码的人来说,越是调不出来,就越困。一开始以为是输入的text的长度的问题,因为这个函数要求字符串以n结尾,怕是字符串长度没算清。还好这个speak_lib的作者写明了espeak_Initialize的时候要是注明AUDIO_OUTPUT_PLAYBACK的话Synth()函数会立即输出语音,先前抄来的代码在那个参数位置是AUDIO_OUTPUT_SYNCHRONOUS,于是还回来,哈哈,出声了,这就算是语音功能的可行性通过了,具体的功能实现交给后来人吧,我不管了。目下最要紧的是把框架弄出来,画UML写class去。
又给Cognitec又发去一个邮件,还是没有回复,没有回复,着急,着急~~
DES: 今天操场上见到三个硬币,要换在小学,足够开心一天的
奶奶个熊,办公室在linux上用vim记得这点东西到了windows上换行符统统没换过来,就算是用win-gvim打开,中文字符居然是乱码——没心思琢磨这个字符编码的问题,换行符替换也懒得琢磨——以后再也不用vim记博客了~
Recent Comments