当前位置:网站首页>100 deep learning cases | day 41 - convolutional neural network (CNN): urbansound 8K audio classification (speech recognition)
100 deep learning cases | day 41 - convolutional neural network (CNN): urbansound 8K audio classification (speech recognition)
2022-04-23 16:21:00 【Classmate K】
- Running environment :python3
- author :K Students
- From column :《 Deep learning 100 example 》
- Select columns :《 Novice introduction and deep learning 》
- Recommendation column :《Matplotlib course 》
- 🧿 Excellent column :《Python introduction 100 topic 》
My environment :
- Language environment :Python3.6.5
- compiler :jupyter notebook
- Deep learning environment :TensorFlow2.4.1
- The graphics card (GPU):NVIDIA GeForce RTX 3080
- Data address :【 Portal 】
Our code flow chart is as follows :
List of articles
One 、 preparation
Hello everyone , I am a K Students !
Today, I would like to share with you a practical case of audio classification .
The data set used is UrbanSound8K, The dataset contains data from 10 There are three categories of urban sound 8732 A tag sound excerpt (<=4s):air_conditioner、car_horn、children_playing、dog_bark、drilling、enginge_idling、gun_shot、jackhammer、siren and street_music, Data exist separately fold1-fold10 Wait in ten folders .
In addition to sound excerpts , One is also provided CSV file , It contains metadata about each excerpt .
Methods to introduce
-
Yes 3 A basic method of extracting features from audio files :
a) Using audio files mffcs data
b) Use the spectrum image of audio , Then convert it into data points ( Just like you did with the image ). This can be used Librosa Of mel_spectogram function Easy to finish
c) Combine these two features to build a better model . ( It takes a lot of time to read and extract data ). -
I choose to use the second method .
-
Labels have been converted to classification data for classification .
-
CNN Has been used as the main layer for classifying data
1. Import the required Library
# Basic Libraries
import pandas as pd
import numpy as np
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense, MaxPool2D, Dropout
from tensorflow.keras.utils import to_categorical
import os,glob,skimage,librosa
import librosa.display
import warnings
warnings.filterwarnings("ignore") # Ignore the warning
2. Analyze data type and format
analysis CSV data
df = pd.read_csv("./41-data/UrbanSound8K.csv")
df.head()
slice_file_name | fsID | start | end | salience | fold | classID | class | |
---|---|---|---|---|---|---|---|---|
0 | 100032-3-0-0.wav | 100032 | 0.0 | 0.317551 | 1 | 5 | 3 | dog_bark |
1 | 100263-2-0-117.wav | 100263 | 58.5 | 62.500000 | 1 | 5 | 2 | children_playing |
2 | 100263-2-0-121.wav | 100263 | 60.5 | 64.500000 | 1 | 5 | 2 | children_playing |
3 | 100263-2-0-126.wav | 100263 | 63.0 | 67.000000 | 1 | 5 | 2 | children_playing |
4 | 100263-2-0-137.wav | 100263 | 68.5 | 72.500000 | 1 | 5 | 2 | children_playing |
Name
-
slice_file_name: Audio file name . The naming format is : [fsID]-[classID]-[occurrenceID]-[sliceID].wav
- [fsID]: Extract extracts from ( fragment ) Recorded Freesound ID
- [classID]: Category ID
- [occurrenceID]: A numeric identifier , Sounds used to distinguish different events in the original recording
- [sliceID]: A numeric identifier , Used to distinguish different slices obtained from the same event
-
fsID: Extract extracts from ( fragment ) Recorded Freesound ID
-
start: original Freesound The start time of the clip in the recording
-
end: original Freesound The end time of the slice in the recording
-
salience: The voice ( subjective ) Significance level . 1 = prospects ,2 = background .
-
fold: altogether 1-10,10 A folder
-
classID: Digital identifier of the sound category :
- 0 = air_conditioner
- 1 = car_horn
- 2 = children_playing
- 3 = dog_bark
- 4 = drilling
- 5 = engine_idling
- 6 = gun_shot
- 7 = jackhammer
- 8 = siren
- 9 = street_music
Use Librosa Analyze random sound samples
a,b = librosa.load() Return value :
- a: Audio signal value , The type is ndarray
- b: Sampling rate
3. Data presentation
import IPython.display as ipd
ipd.Audio('./41-data/fold5/100263-2-0-117.wav')
dataSample, sampling_rate = librosa.load('./41-data/fold5/100032-3-0-0.wav')
plt.figure(figsize=(10, 3))
D = librosa.amplitude_to_db(np.abs(librosa.stft(dataSample)), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
plt.show()
arr = np.array(df["slice_file_name"])
fold = np.array(df["fold"])
cla = np.array(df["class"])
for i in range(192, 197, 2):
plt.figure(figsize=(8, 2))
path = './41-data/fold' + str(fold[i]) + '/' + arr[i]
data, sampling_rate = librosa.load(path)
D = librosa.amplitude_to_db(np.abs(librosa.stft(data)), ref=np.max)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title(cla[i])
Two 、 Feature extraction and data set construction
Let's see how to use it librosa.feature.melspectrogram()
The data extracted by the function shape
arr = librosa.feature.melspectrogram(y=data, sr=sampling_rate)
arr.shape
(128, 173)
1. Data feature extraction
feature = []
label = []
def parser():
# Load the file and extract the features
for i in range(8732):
if i%1000 == 0:
print(" Has been extracted %d Data characteristics "%i)
file_name = './41-data/fold' + str(df["fold"][i]) + '/' + df["slice_file_name"][i]
X, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
# Extract the spectrum to form an image array
mels = np.mean(librosa.feature.melspectrogram(y=X, sr=sample_rate).T,axis=0)
feature.append(mels)
label.append(df["classID"][i])
print(" Data feature extraction is complete !")
return [feature, label]
temp = parser()
Has been extracted 0 Data characteristics
Has been extracted 1000 Data characteristics
Has been extracted 2000 Data characteristics
Has been extracted 3000 Data characteristics
Has been extracted 4000 Data characteristics
Has been extracted 5000 Data characteristics
Has been extracted 6000 Data characteristics
Has been extracted 7000 Data characteristics
Has been extracted 8000 Data characteristics
Data feature extraction is complete !
temp_numpy = np.array(temp).transpose()
X_ = temp_numpy[:, 0]
Y_ = temp_numpy[:, 1]
X = np.array([X_[i] for i in range(8732)])
Y = to_categorical(Y_)
print(X.shape, Y.shape)
(8732, 128) (8732, 10)
2. Dataset construction
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state = 1)
X_train = X_train.reshape(6549, 16, 8, 1)
X_test = X_test.reshape(2183, 16, 8, 1)
input_dim = (16, 8, 1)
3、 ... and 、 Build models and train
model = Sequential()
model.add(Conv2D(64, (3, 3), padding = "same", activation = "tanh", input_shape = input_dim))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), padding = "same", activation = "tanh"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(1024, activation = "tanh"))
model.add(Dense(10, activation = "softmax"))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
model.fit(X_train, Y_train, epochs = 90, batch_size = 50, validation_data = (X_test, Y_test))
Epoch 1/90
131/131 [==============================] - 3s 4ms/step - loss: 1.5368 - accuracy: 0.4717 - val_loss: 1.3617 - val_accuracy: 0.5144
Epoch 2/90
131/131 [==============================] - 0s 2ms/step - loss: 1.1502 - accuracy: 0.6091 - val_loss: 1.1119 - val_accuracy: 0.6326
......
131/131 [==============================] - 0s 2ms/step - loss: 0.0481 - accuracy: 0.9835 - val_loss: 0.8535 - val_accuracy: 0.8653
Epoch 89/90
131/131 [==============================] - 0s 2ms/step - loss: 0.0511 - accuracy: 0.9818 - val_loss: 0.7716 - val_accuracy: 0.8694
Epoch 90/90
131/131 [==============================] - 0s 2ms/step - loss: 0.0502 - accuracy: 0.9829 - val_loss: 0.8673 - val_accuracy: 0.8630
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 16, 8, 64) 640
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 4, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 4, 128) 73856
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 2, 128) 0
_________________________________________________________________
dropout (Dropout) (None, 4, 2, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_1 (Dense) (None, 10) 10250
=================================================================
Total params: 1,134,346
Trainable params: 1,134,346
Non-trainable params: 0
_________________________________________________________________
predictions = model.predict(X_test)
score = model.evaluate(X_test, Y_test)
print(score)
69/69 [==============================] - 0s 1ms/step - loss: 0.8673 - accuracy: 0.8630
[0.8672816753387451, 0.8630325198173523]
版权声明
本文为[Classmate K]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231612566942.html
边栏推荐
- The first line and the last two lines are frozen when paging
- Postman batch production body information (realize batch modification of data)
- Gartner predicts that the scale of cloud migration will increase significantly; What are the advantages of cloud migration?
- 保姆级Anaconda安装教程
- Nanny Anaconda installation tutorial
- volatile的含义以及用法
- What is homebrew? And use
- Hypermotion cloud migration helped China Unicom. Qingyun completed the cloud project of a central enterprise and accelerated the cloud process of the group's core business system
- What is cloud migration? The four modes of cloud migration are?
- What is the experience of using prophet, an open source research tool?
猜你喜欢
Research and Practice on business system migration of a government cloud project
MySQL - execution process of MySQL query statement
Website pressure measurement tools Apache AB, webbench, Apache jemeter
The biggest winner is China Telecom. Why do people dislike China Mobile and China Unicom?
Nacos 详解,有点东西
面试题 17.10. 主要元素
How important is the operation and maintenance process? I heard it can save 2 million a year?
Sail soft calls the method of dynamic parameter transfer and sets parameters in the title
Intersection, union and difference sets of spark operators
Implement default page
随机推荐
[key points of final review of modern electronic assembly]
Compile, connect -- Notes
Hyperbdr cloud disaster recovery v3 Version 2.1 release supports more cloud platforms and adds monitoring and alarm functions
捡起MATLAB的第(2)天
【Pygame小游戏】10年前风靡全球的手游《愤怒的小鸟》,是如何霸榜的?经典回归......
Read the meaning of serial port and various level signals
Oak-d raspberry pie cloud project [with detailed code]
OAK-D树莓派点云项目【附详细代码】
[section 5 if and for]
Start Oracle service on Linux
一文掌握vscode远程gdb调试
捡起MATLAB的第(5)天
js正則判斷域名或者IP的端口路徑是否正確
力扣-746.使用最小花费爬楼梯
Detailed explanation of gzip and gunzip decompression parameters
5分钟,把你的Excel变成在线数据库,神奇的魔方网表excel数据库
保姆级Anaconda安装教程
Grbl learning (II)
JIRA screenshot
Introduction notes to PHP zero Foundation (13): array related functions