AI VOICE PRINT RECOGNITION INTELLIGENT MODULE

AI VOICE PRINT RECOGNITION INTELLIGENT MODULE

Voiceprint recognition is a biometric technology that converts sound signals into electrical signals for computer-based feature extraction and identity verification. Its biological basis lies in the unique acoustic spectrum carried by biological voice signals, which are as unique and stable as fingerprints. Voice signals of the same category also carry unique sound wave spectra. These signals are extracted,classified, and identified through voiceprint recognition technology. The primary tasks in voiceprint recognition include: speech signal processing, voiceprint feature extraction, voiceprint modeling, voiceprint comparison, and decision-making analysis.

Product features

Noise sound type recognition refers to the classification of noise in the environment through machine learning algorithms to determine its possible sources and types. For example, distinguish machine noise, human voice noise, traffic noise, etc.;
The application of AI in noise sound type recognition is mainly reflected in deep learning technology, especially the application of convolutional neural network;
The process involves collecting massive audio data and training deep learning algorithms to extract useful features for model optimization. The input audio is then compared against known voice models by calculating the distance or similarity between its features and the model´s parameters, which helps identify the source of the audio.
For specific application scenarios, such as indoor scene recognition, outdoor scene recognition ， public place scene recognition, office scene recognition, etc., you can also use a dedicated audio processing front end.

Technical parameters

1、Master control chip:Rockchip RK3588
2、CPU: 8-core 64-bit processor
3、Four Cortex-A76 and four Cortex-A55 and a separate NEON coprocessor
4、The Cortex-A76 has a frequency of 2.4GHz and the Cortex-A55 has a frequency of 1.8GHz
5、GPU: Integrated ARM Mali-G610; built-in 3D GPU; compatible with OpenGL ES1.1/2.0/3.2, OpenCL 2.2 and Vulkan 1.2
6、NPU: The embedded NPU supports mixed operations of INT4/INT8/INT16/FP16 with up to 6Top computing power
7、Storage: 8G+64G emmc
8、The device features two HDMI output ports, one input HDMI port, and supports decoding up to 8K@60P video. It includes two PCIe-extended 2.5G Ethernet ports, an M.2 M-Key slot for installing NVMe SSDs, and an M.2 E-Key slot supporting Wi-Fi6/BT modules. Additionally, it offers 2 USB 3.0 ports, 2 USB 2.0 ports, and 2 Type-C ports (one of which is a power port).
9、A PyTorch-based voiceprint recognition model: This deep learning-driven speaker identification system incorporates channel attention mechanisms, information propagation, and aggregation operations. Its architecture features multi-layered temporal domain neural networks (TDNN) at the frame level, a statistical pooling layer, two sentence-level fully connected layers, and a softmax layer with cross-entropy loss function.
10、Feature extraction: pre-weighting-{>} sub-add window-{>} discrete Fourier transform-{>} Mael filter group-{>} inverse discrete Fourier transform
11、Model training set:{>}100,000 training samples

12、Sound types: Sound types are mainly divided into five categories, namely, life noise, construction noise, industrial noise, traffic noise and natural noise, including thunder, dog barking, wind, knocking, insect chirping,bird chirping,frog chirping and other sound subcategories of no less than 50

13、Voice print recognition accuracy:≥90%
14、Recognition response rate: {<}1s
15.Calling mode: Support cloud calling or local terminal calling
16、Technical protocol: HTTP support
17、Interface types: USB,HDMI,SD,RJ45
18、Power interface: TYPE-C
19、Working voltage:5V3A

Classification of voiceprint library

Category I: five categories, natural noise, living noise, construction noise, industrial noise , traffic noise; classification basis: HJ640 standard, noise pollution prevention report, noise EIA, noise law, etc.; Secondary classification: according to the application scenario or common characteristics of sound; Level 3 classification: As the sub-station identification results show, the original sound types are merged and optimized.