当前位置:网站首页>Xiaodu Xiaodu is here!
Xiaodu Xiaodu is here!
2022-08-05 06:15:00 【Chengyun Technology】
What is Intelligent Speech Recognition?
Simply put
Intelligent speech recognition is the human voice signal
The process of converting to text.
We usually come into contact with
Speech recognition, face recognition, OCR, etc.
All belong to the perceptual intelligence in artificial intelligence
Its core function is
Transforms information from the physical world into computer-processable information
Provide the foundation for subsequent cognitive intelligence.
The hierarchy of needs that speech recognition can meet
01Information synchronization between people
The voice information converted into text, due to the lack of time axis constraints, can be obtained by human eyes much faster than ears in the same order of magnitude.
02Search & Semantic Extraction
Using semantic modeling to retrieve words/semantics that are more concerned in some business scenarios, or extract them and record them in a structured way.
03Human Interaction
Interact with machines/virtual assistants in a more natural way, enabling anthropomorphic conversations, manipulating devices, or obtaining answers to questions.
04Data Mining
By clustering data or opening up with various dimensional data systems, value mining can be performed on the semantic data of individuals/populations/specific fields.
Closed Domain Identification
1 Definition:
The recognition range is a pre-specified set of words/words.
The algorithm only performs speech recognition within the set of closed-domain recognition words preset by the developer, and will reject speech outside the range.
2. Product form :
Streaming - Simultaneous acquisition.
3. Typical application scenarios:
Scenarios that do not involve multiple rounds of interaction and multiple semantic statements.
For example, for smart home and TV boxes with simple command interaction, the voice control commands are generally only "open the curtains", "open the central station" and so on.
Open Domain Identification
1. Definition
There is no need to specify a set of recognized words in advance, the algorithm will recognize the entire range of the large set of languages.
2. Product form
1. Streaming upload - synchronous acquisition
The application/software will automatically record the speaker's voice and upload it to the cloud continuously, and the speaker can see the returned text in real time after speaking.
2. Recorded audio file upload - asynchronous acquisition
Audio duration is generally <3/5 hours.The user needs to call the software interface or the hardware platform to pre-record the audio in the specified format, and use the interface provided by the voice cloud service provider to upload the audio. After the upload is complete, the connection can be disconnected.The user obtains the result by polling the voice cloud server or using the callback interface.
3.The recorded audio file is uploaded and obtained synchronously. The audio duration is generally less than <1 minute.Users need to pre-record the audio in the specified format and upload the audio using the interface provided by the voice cloud service provider.
4. Typical application scenarios
1. Mainly in input scenarios, such as input method, real-time subtitles during conferences/court trials.
2. Audio/video subtitle configuration that has been recorded; customer service voice quality inspection and UGC voice content review scenarios with low real-time requirements.
3. As a supplement to the first two, it is suitable for scenarios where the audio recording interface cannot be used to upload real-time audio streams, or the real-time requirements for result acquisition are relatively high.
边栏推荐
- 磁盘管理与文件系统
- Hugo搭建个人博客
- Small example of regular expression--remove spaces in the middle and on both sides of the string
- Mongodb query analyzer parsing
- Getting Started Doc 06 Adding files to a stream
- 用户和用户组管理、文件权限管理
- 618,你也许可以清醒亿点点
- IP packet format (ICMP protocol and ARP protocol)
- 正则表达式小实例--验证邮箱地址
- spark operator-wholeTextFiles operator
猜你喜欢

Dsf5.0 bounced points determine not return a value

解决这三大问题,运维效率将超90%的医院

交换机原理

Technology Sharing Miscellaneous Technologies

Network wiring and digital-to-system conversion
![[Pytorch study notes] 8. How to use WeightedRandomSampler (weight sampler) when the training category is unbalanced data](/img/29/5b44c016bd11f0c0a9110cf513f4e1.png)
[Pytorch study notes] 8. How to use WeightedRandomSampler (weight sampler) when the training category is unbalanced data

Mongodb query analyzer parsing
![[Day1] VMware software installation](/img/24/20cc77e904dbe7dc1b5224c64d6329.png)
[Day1] VMware software installation

lvm logical volume and disk quota

markdown editor template
随机推荐
【Day8】Knowledge about disk and disk partition
Spark source code-task submission process-6.1-sparkContext initialization-create spark driver side execution environment SparkEnv
spark算子-coalesce算子
入门文档08 条件插件
VRRP principle and command
dsf5.0 弹框点确定没有返回值的问题
网络不通?服务丢包?看这篇就够了
增长:IT运维发展趋势报告
Getting Started Doc 08 Conditional Plugins
Hugo搭建个人博客
【Machine Learning】1 Univariate Linear Regression
User and user group management, file permission management
入门文档12 webserve + 热更新
In-depth Zabbix user guide - from the green boy
[Day6] File system permission management, file special permissions, hidden attributes
spark source code-RPC communication mechanism
【Day8】使用LVM扩容所涉及的命令
入门文档01 series按顺序执行
js动态获取屏幕宽高度
What impact does CIPU have on the cloud computing industry?