当前位置:网站首页>100 deep learning cases | day 41 - convolutional neural network (CNN): urbansound 8K audio classification (speech recognition)
100 deep learning cases | day 41 - convolutional neural network (CNN): urbansound 8K audio classification (speech recognition)
2022-04-23 16:21:00 【Classmate K】
- Running environment :python3
- author :K Students
- From column :《 Deep learning 100 example 》
- Select columns :《 Novice introduction and deep learning 》
- Recommendation column :《Matplotlib course 》
- 🧿 Excellent column :《Python introduction 100 topic 》
My environment :
- Language environment :Python3.6.5
- compiler :jupyter notebook
- Deep learning environment :TensorFlow2.4.1
- The graphics card (GPU):NVIDIA GeForce RTX 3080
- Data address :【 Portal 】
Our code flow chart is as follows :

List of articles
One 、 preparation
Hello everyone , I am a K Students !
Today, I would like to share with you a practical case of audio classification .
The data set used is UrbanSound8K, The dataset contains data from 10 There are three categories of urban sound 8732 A tag sound excerpt (<=4s):air_conditioner、car_horn、children_playing、dog_bark、drilling、enginge_idling、gun_shot、jackhammer、siren and street_music, Data exist separately fold1-fold10 Wait in ten folders .
In addition to sound excerpts , One is also provided CSV file , It contains metadata about each excerpt .
Methods to introduce
-
Yes 3 A basic method of extracting features from audio files :
a) Using audio files mffcs data
b) Use the spectrum image of audio , Then convert it into data points ( Just like you did with the image ). This can be used Librosa Of mel_spectogram function Easy to finish
c) Combine these two features to build a better model . ( It takes a lot of time to read and extract data ). -
I choose to use the second method .
-
Labels have been converted to classification data for classification .
-
CNN Has been used as the main layer for classifying data
1. Import the required Library
# Basic Libraries
import pandas as pd
import numpy as np
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPool2D, Dropout
from tensorflow.keras.utils import to_categorical
import os,glob,skimage,librosa
import librosa.display
import warnings
warnings.filterwarnings("ignore") # Ignore the warning
2. Analyze data type and format
analysis CSV data
df = pd.read_csv("./41-data/UrbanSound8K.csv")
df.head()
| slice_file_name | fsID | start | end | salience | fold | classID | class | |
|---|---|---|---|---|---|---|---|---|
| 0 | 100032-3-0-0.wav | 100032 | 0.0 | 0.317551 | 1 | 5 | 3 | dog_bark |
| 1 | 100263-2-0-117.wav | 100263 | 58.5 | 62.500000 | 1 | 5 | 2 | children_playing |
| 2 | 100263-2-0-121.wav | 100263 | 60.5 | 64.500000 | 1 | 5 | 2 | children_playing |
| 3 | 100263-2-0-126.wav | 100263 | 63.0 | 67.000000 | 1 | 5 | 2 | children_playing |
| 4 | 100263-2-0-137.wav | 100263 | 68.5 | 72.500000 | 1 | 5 | 2 | children_playing |
Name
-
slice_file_name: Audio file name . The naming format is : [fsID]-[classID]-[occurrenceID]-[sliceID].wav
- [fsID]: Extract extracts from ( fragment ) Recorded Freesound ID
- [classID]: Category ID
- [occurrenceID]: A numeric identifier , Sounds used to distinguish different events in the original recording
- [sliceID]: A numeric identifier , Used to distinguish different slices obtained from the same event
-
fsID: Extract extracts from ( fragment ) Recorded Freesound ID
-
start: original Freesound The start time of the clip in the recording
-
end: original Freesound The end time of the slice in the recording
-
salience: The voice ( subjective ) Significance level . 1 = prospects ,2 = background .
-
fold: altogether 1-10,10 A folder
-
classID: Digital identifier of the sound category :
- 0 = air_conditioner
- 1 = car_horn
- 2 = children_playing
- 3 = dog_bark
- 4 = drilling
- 5 = engine_idling
- 6 = gun_shot
- 7 = jackhammer
- 8 = siren
- 9 = street_music
Use Librosa Analyze random sound samples
a,b = librosa.load() Return value :
- a: Audio signal value , The type is ndarray
- b: Sampling rate
3. Data presentation
import IPython.display as ipd
ipd.Audio('./41-data/fold5/100263-2-0-117.wav')

dataSample, sampling_rate = librosa.load('./41-data/fold5/100032-3-0-0.wav')
plt.figure(figsize=(10, 3))
D = librosa.amplitude_to_db(np.abs(librosa.stft(dataSample)), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
plt.show()

arr = np.array(df["slice_file_name"])
fold = np.array(df["fold"])
cla = np.array(df["class"])
for i in range(192, 197, 2):
plt.figure(figsize=(8, 2))
path = './41-data/fold' + str(fold[i]) + '/' + arr[i]
data, sampling_rate = librosa.load(path)
D = librosa.amplitude_to_db(np.abs(librosa.stft(data)), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title(cla[i])

Two 、 Feature extraction and data set construction
Let's see how to use it librosa.feature.melspectrogram() The data extracted by the function shape
arr = librosa.feature.melspectrogram(y=data, sr=sampling_rate)
arr.shape
(128, 173)
1. Data feature extraction
feature = []
label = []
def parser():
# Load the file and extract the features
for i in range(8732):
if i%1000 == 0:
print(" Has been extracted %d Data characteristics "%i)
file_name = './41-data/fold' + str(df["fold"][i]) + '/' + df["slice_file_name"][i]
X, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
# Extract the spectrum to form an image array
mels = np.mean(librosa.feature.melspectrogram(y=X, sr=sample_rate).T,axis=0)
feature.append(mels)
label.append(df["classID"][i])
print(" Data feature extraction is complete !")
return [feature, label]
temp = parser()
Has been extracted 0 Data characteristics
Has been extracted 1000 Data characteristics
Has been extracted 2000 Data characteristics
Has been extracted 3000 Data characteristics
Has been extracted 4000 Data characteristics
Has been extracted 5000 Data characteristics
Has been extracted 6000 Data characteristics
Has been extracted 7000 Data characteristics
Has been extracted 8000 Data characteristics
Data feature extraction is complete !
temp_numpy = np.array(temp).transpose()
X_ = temp_numpy[:, 0]
Y_ = temp_numpy[:, 1]
X = np.array([X_[i] for i in range(8732)])
Y = to_categorical(Y_)
print(X.shape, Y.shape)
(8732, 128) (8732, 10)
2. Dataset construction
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state = 1)
X_train = X_train.reshape(6549, 16, 8, 1)
X_test = X_test.reshape(2183, 16, 8, 1)
input_dim = (16, 8, 1)
3、 ... and 、 Build models and train
model = Sequential()
model.add(Conv2D(64, (3, 3), padding = "same", activation = "tanh", input_shape = input_dim))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), padding = "same", activation = "tanh"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(1024, activation = "tanh"))
model.add(Dense(10, activation = "softmax"))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
model.fit(X_train, Y_train, epochs = 90, batch_size = 50, validation_data = (X_test, Y_test))
Epoch 1/90
131/131 [==============================] - 3s 4ms/step - loss: 1.5368 - accuracy: 0.4717 - val_loss: 1.3617 - val_accuracy: 0.5144
Epoch 2/90
131/131 [==============================] - 0s 2ms/step - loss: 1.1502 - accuracy: 0.6091 - val_loss: 1.1119 - val_accuracy: 0.6326
......
131/131 [==============================] - 0s 2ms/step - loss: 0.0481 - accuracy: 0.9835 - val_loss: 0.8535 - val_accuracy: 0.8653
Epoch 89/90
131/131 [==============================] - 0s 2ms/step - loss: 0.0511 - accuracy: 0.9818 - val_loss: 0.7716 - val_accuracy: 0.8694
Epoch 90/90
131/131 [==============================] - 0s 2ms/step - loss: 0.0502 - accuracy: 0.9829 - val_loss: 0.8673 - val_accuracy: 0.8630
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 16, 8, 64) 640
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 4, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 4, 128) 73856
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 2, 128) 0
_________________________________________________________________
dropout (Dropout) (None, 4, 2, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_1 (Dense) (None, 10) 10250
=================================================================
Total params: 1,134,346
Trainable params: 1,134,346
Non-trainable params: 0
_________________________________________________________________
predictions = model.predict(X_test)
score = model.evaluate(X_test, Y_test)
print(score)
69/69 [==============================] - 0s 1ms/step - loss: 0.8673 - accuracy: 0.8630
[0.8672816753387451, 0.8630325198173523]
版权声明
本文为[Classmate K]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231612566942.html
边栏推荐
- Sail soft implements a radio button, which can uniformly set the selection status of other radio buttons
- Force buckle - 198 raid homes and plunder houses
- Homewbrew installation, common commands and installation path
- Countdown 1 day ~ 2022 online conference of cloud disaster tolerance products is about to begin
- Compile, connect -- Notes
- OAK-D树莓派点云项目【附详细代码】
- Filter usage of spark operator
- 保姆级Anaconda安装教程
- Leetcode-396 rotation function
- Hyperbdr cloud disaster recovery v3 Release of version 3.0 | upgrade of disaster recovery function and optimization of resource group management function
猜你喜欢

Six scenarios of cloud migration

Sail soft segmentation solution: take only one character (required field) of a string

糖尿病眼底病变综述概要记录

OAK-D树莓派点云项目【附详细代码】

面试题 17.10. 主要元素

Summary according to classification in sail software

volatile的含义以及用法

力扣-198.打家劫舍

Postman batch production body information (realize batch modification of data)

MySQL - execution process of MySQL query statement
随机推荐
第九天 static 抽象类 接口
面试题 17.10. 主要元素
Database dbvisualizer Pro reported file error, resulting in data connection failure
Day (3) of picking up matlab
How to conduct application security test (AST)
Filter usage of spark operator
Hypermotion cloud migration helped China Unicom. Qingyun completed the cloud project of a central enterprise and accelerated the cloud process of the group's core business system
TIA博图——基本操作
Coalesce and repartition of spark operators
MySQL - MySQL查询语句的执行过程
一文读懂串口及各种电平信号含义
GRBL学习(一)
5分钟,把你的Excel变成在线数据库,神奇的魔方网表excel数据库
Win11 / 10 home edition disables the edge's private browsing function
捡起MATLAB的第(5)天
最詳細的背包問題!!!
Distinct use of spark operator
一文掌握vscode远程gdb调试
[key points of final review of modern electronic assembly]
Intersection, union and difference sets of spark operators