Serial tutorial of RT thread intelligent vehicle target recognition system -- handwriting recognition model (1)

__ Shirley 2021-08-10 11:37:59 阅读数:809

serial tutorial rt thread intelligent


This document is very long, so we will divide it into 5 A series of articles , Because machine learning is not pure software development , Simply call the library function API, It needs some theoretical support , If the theoretical part is not introduced at all , Maybe I don't know why the model is designed like this , There is something wrong with the model . However, if the document is too long, it may be difficult for you to read it patiently , Especially in the theoretical part, there will be many formulas , But machine learning is right   Theoretical basis   and   Programming ability   There are some requirements , I believe that there will be a lot of gains if we stick to it , I also try my best to introduce the theory and application .

This article assumes that everyone already knows how to use RT-Thread Of env Tools download packages , And generate project upload firmware to stm32 On , Because these two serials focus on Loading onnx General machine learning model , About RT-Thread You can go to the official website documentation center : Look for .


First , Briefly introduce the scope of each topic mentioned above (Domain), Artificial intelligence (Artifitial Intelligence) It's the biggest topic , If you use a picture to illustrate :



Then machine learning (Machine Learning) That's the theme of this document , however   machine learning   It's still a very big topic :


Here is a brief introduction to the three types mentioned above :

Supervised learning (Supervised Learning):  This should be the most widely used area , For example, face recognition , I'll give you a lot of pictures in advance , Then tell you which of them contain faces , Which does not contain , You summed up the features of the face from the photos I gave , This is the training process . Finally, I will provide some pictures that I have never seen before , If the algorithm is well trained , It can distinguish whether a picture contains face or not . So the biggest characteristic of supervised learning is the training set , Tell the model what's right , What is wrong .


Unsupervised learning (Unsupervised Learning):  For example, online shopping recommendation system , The model will classify my browsing records , Then automatically recommend relevant products to me . The biggest characteristic of unsupervised learning is that there is no standard answer , For example, water cups can be classified as daily necessities , It can also be classified as a gift , No problem .


Reinforcement learning (Reinforcement Learnong):  Reinforcement learning should be the most attractive part of machine learning , for example Gym There are many examples of training computers to play games by themselves and get high scores . Strengthening learning is mainly through trial and error (Action), Find a way to get the most out of yourself , That's why a lot of examples are computer games .

So what's going on in the back of the document is all about   Supervised learning , because Handwriting Recognition I need some training sets to tell me what numbers these images should actually be , But there are many ways to supervise learning , There are two main categories: classification and regression :



classification (Classification):  For example, handwriting recognition , The characteristic of this kind of problem is that the final result is discrete , The final number of categories can only be 0, 1, 2, 3 Not really. 1.414, 1.732 Such decimals .


Return to (Regression):  For example, the classic house price forecast , The results of such problems are continuous , For example, house prices will change continuously , There are infinite possibilities , Unlike handwriting recognition, it's just 0-9 this 10 Species category .

So it looks like , The handwriting recognition introduced next is a   Classification problem . But there are many classification algorithms , This article is to introduce the application of a lot of relatively mature   neural network  (Neural Network).


Artificial neural network (Artifitial Neural Network): This is a more general method , It can be used in various fields to do data fitting , But there are more suitable algorithms for image and voice .


Convolutional neural networks (Convolutional Neural Network): Mainly used in image field , It will also be introduced in detail later .


Cyclic neural network (Recurrent Neural Network): The comparison applies to sequential input like sound , So there are many applications in the field of language recognition .

So to conclude , This document describes Artificial intelligence The following development is relatively fast machine learning Branch , Then the solution is machine learning Supervised learning Below Classification problem , It's using neural network Inside Convolutional neural networks  (CNN) Method .


1 Neural network theory


This part mainly introduces the whole running process of neural network , How to prepare the training set , What is training , Why train , How to train , And what I got after training .

1.1 Linear regression (Linear Regression)

1.1.1 The regression model

Prediction of machine learning training , First of all, we need to know what our training model looks like , Take the classic linear regression model as an example , Back of the artificial neural network (ANN) In fact, it can be seen as multiple linear regression combinations . So what is a linear regression model ? For example, here are some scattered points , Hope to find a straight line to fit , The model of linear regression fit is :



So if there is a point later x = 3, Areas not covered by these points on the graph , We can also predict the corresponding y. But the formula above usually uses another way of expression , The final prediction is y Usually use h θ  (hypothesis) Express , And its subscript θ Represents different training parameters, that is k, b. So the model becomes :


therefore θ 0   Corresponding b,θ Corresponding k. But this representation model is not universal enough , such as x It may not be a one-dimensional vector , For example, the classic house price forecast , We need to know the price , You may need the size of the house , The number of rooms and so on , So use a more general way to express : watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk= This is the linear regression model , Just vector multiplication , The above formula is easy to calculate . By the way ,θ Need a transpose θ T , Because we are used to using column vectors . The formula above and y=kx+b It's the same thing , It's just a different way of expression , But this representation is more general , And it's more simple and beautiful :


1.1.2 The evaluation index

In order to make the above model fit these scattered points well , Our goal is to change the model parameters  θ0  and  θ1, That's the slope and intercept of this line , So that it can respond to the trend of scatter point well , The following animation is a very intuitive reflection of the training process .


You can see , It started as a nearly horizontal line , But slowly its slope and intercept move to a better position , So here comes the question , How can we evaluate whether the current position of this straight line can meet our needs ? A very direct idea is to find out the actual value of all scattered points y And the test value of our model h θ   The absolute value of the difference , This evaluation index is called loss function J(θ)  (cost function)


The right side of the function is divided by 2 It's for the convenience of counting down , Because if the formula on the right is derivative , The square above will get one 2, It happens to be in the denominator 2 Offset . In this way, we have the evaluation indicators , The smaller the loss function, the better , In this way, we can know that the current model is incorrect yes Can meet the needs well , The next step is to tell the model how to optimize in a better direction , This is training (Training) The process .

1.1.3 model training

In order to make the parameters of the model θ Be able to move in a better direction , That is to say, the natural idea is to go downhill , For example, the loss function above is actually a hyperbola , As long as we go downhill, we can always reach the lowest point of the function :


So what is " downhill slope " In the direction of ? It's actually the direction of the derivative , You can see from the animation above , The black dot is always going to the lowest point along the tangent direction , If we derivative the loss function , That is to say J(θ) Derivation : watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk= We now know θ Which direction should I go , How far should we go each time ? Like the animation above , Even if the black dot knows the direction of motion , How much of each exercise is also to be determined . This is called learning speed α (learning rate), So we know how much the parameter should move in which direction every time : watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=

This kind of training method is very famous   Gradient descent method (Gradient Descent), Of course, there are many improved training methods such as Adam, In fact, the principles are almost the same , I won't give you too much introduction here .

1.1.4 summary


The process of machine learning is summed up as , Let's design a model first , Then define an evaluation index called loss function , So we know how to judge the model , The next step is to use a training method , Let the parameters of the model move in the direction that can reduce the loss function , When the loss function is almost no longer reduced , We can think that training is over . The final training is the parameters of the model , With the trained model, we can predict other data . By the way , The above linear regression actually has a standard theoretical solution , That is to say, there is no need to go through the training process , One step to get the optimal weight , We call it  Normal Equation


that , There is a theoretical solution in place , Why do we need to train step by step ? Because the formula above has inverse operation of matrix , When the matrix size is small , The amount of matrix inversion is not large , But once the scale of the matrix goes up , It is almost impossible to use the existing computing power to find the inverse , So this time we need to use gradient descent training method to approach the optimal solution step by step .

1.2 Nonlinear regression (Logistic Regression)

Let's go back to the handwriting recognition example , The linear regression described above results in a continuous number , But the final goal of handwriting recognition is to get a discrete value , That is to say 0-9, So how can this be done ?


This is the model of the previous part , It's very simple , Just add one more... To the final result sigmoid function , Limit the final result to 0-1 That's all right. .


Just like the formula in the picture above ,sigmoid Function is :


If we apply it to the linear regression model , We get a nonlinear regression model , That is to say Logistic Regression:


In this way, we can make sure that the final result we get is 0-1 Between , Then we can define if the final result is greater than 0.5 Namely 1, Less than 0.5 Namely 0, So a continuous output is discrete .

1.3 Artificial neural network (ANN)

Now we introduce the continuous linear regression model Linear Regression, And discrete nonlinear regression models Logistic Regression, The models are very simple , It's only a few centimeters long on paper . So how can such a simple model be combined into a very useful neural network ? In fact, the above model can be seen as a neural network with only one layer , We type in x After one calculation, we get the output h θ  了 : watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=

If we don't get the results so quickly , But insert another layer in the middle ? So we have a neural network with a hidden layer .


In the picture above , We use it a representative Activation function (activation function) Output , The activation function is also mentioned in the previous section sigmoid function , To limit the output to 0-1, If not , It is likely to go through several layers of neural network calculation , The output value will explode to a very large number . Except, of course, sigmoid Out of function , There are many more activation functions , For example, the next part is very commonly used in convolutional neural networks Relu. in addition , We use bracketed numbers to represent the number of layers of neural network . for example a (1)   Represents the output of the first layer neural network . Of course , The first layer is the input layer , It doesn't have to go through any calculations , So you can see a (1) =x, The activation function output of the first layer is our input directly x. however ,θ (1)  It's not a parameter representing the first level , It's a parameter between the first layer and the second layer , After all, parameters exist in the calculation process between two-layer networks . therefore , We can summarize the above neural network structure : ● Input layer :a (1) =x
● Hidden layer :a (2)= g(θ (1) a (1) )
● Output layer :h (θ) =g(θ (2) a (2) ) If we set the last output layer node to be 10 individual , Then it can be used to express 0-9 this 10 A number . If we add more hidden layers , Doesn't it look like interconnected neurons ?


If we go deeper Go Deeper ( The author mentioned in the paper that , His inspiration for deep learning actually comes from inception )


So we have a deep neural network :




If you want to know , How many hidden layers should be selected , Each hidden layer should select several nodes , This is where you come from , Where to go is the same , It's the ultimate problem of neural networks ???? Last , The training method of neural network is used   Back propagation  (Back Propagation), If you are interested, you can find a more detailed introduction here .

1.4 Convolutional neural networks (CNN)

Finally to the convolutional neural network that will be used later , As you can see from the previous Introduction , In fact, the model of neural network is very simple , I don't know much about mathematics , Just know matrix multiplication , Function derivation is OK , And the deep neural network is just doing matrix multiplication and activation function operation repeatedly :


It's a bit monotonous to repeat the same operation , The convolutional neural network introduced below introduces more interesting operations , There are mainly :
  • Cov2D
  • Maxpooling
  • Relu
  • Dropout
  • Flatten
  • Dense
  • Softmax

Next, we will introduce these operators one by one .

1.4.1 Conv2D

First of all, the biggest feature of neural network in image field is the introduction of convolution operation , Although the name seems a little mysterious , In fact, convolution is very simple .

Here is why convolution is introduced , Although the previous matrix multiplication can solve many problems , But when it comes to images , To a 1920*1080 Multiply the image , It's just one. [1, 2,073,600] The matrix of the , The amount of computation is not small , And convolution , The amount of calculation will be greatly reduced ; On the other hand , If we compress a two-dimensional image into a one-dimensional vector , In fact, the information that pixel points are related to each other in the up, down, left and right directions is lost , For example, a pixel is usually similar to the surrounding color , This information is very important image information many times .

The advantages of convolution operation are introduced , So what exactly is convolution ? In fact, convolution is a simple addition, subtraction, multiplication and division , We need an image , And then there's a convolution kernel (Kernel):


The image above goes through a 3x3 Convolution kernel operation , The edge of the image is extracted very well , The following animation clearly introduces matrix operation :


The convolution kernel used in the above animation is a 3x3 Matrix :


If we pause the animation :


It can be seen that the convolution operation is actually to scan the convolution kernel on the image according to the row and column , Multiply the numbers of the corresponding positions , Then sum it , For example, the convolution result in the upper left corner 4 That's how it's calculated ( Here we use    For convolution ):


Of course, the above calculation process is not rigorous with equal sign connection , However, it is convenient to explain the calculation process of convolution . You can see , The computation of convolution is very small compared with the fully connected neural network , Moreover, the relevance of the image in two-dimensional space is preserved , So there are many applications in the field of image . Convolution is very easy to use , But after convolution, the image size becomes smaller , For example, above 5x5 The matrix goes through a 3x3 The result of convolution kernel operation is 3x3 Matrix , So sometimes in order to keep the image size constant , It will be used around the image 0 fill , This operation is called  padding. however padding There's no way to keep the image size the same , Because the convolution kernel of the above animation only moves one lattice in one direction at a time , If every time you exercise 2 grid , that 5x5 The image of 3x3 The convolution of becomes 2x2 The matrix of the , The number of steps each convolution kernel moves is called  stride. The following is the image size calculation formula after convolution operation :


For example, the width of the image above W = 5, Convolution kernel size F = 3, Not used padding therefore P = 0, Steps per move S = 1:


Here's an explanation , All the above calculations are for a convolution kernel , In fact, a convolution layer may have multiple convolution kernels , And actually a lot of CNN The model is also convolution kernel with the number of layers later , More and more .

1.4.2 Maxpooling

The convolution mentioned above can pass through padding Keep the image size the same , But a lot of times we want to move forward with the model , Gradually reduce the image size , Because the final output, like handwriting recognition , In fact, only 0-9 this 10 A digital , But the input of the image is 1920x1080, therefore maxpooling To reduce the size of the image . In fact, this calculation is much simpler than convolution :


Like the left side 4x4 The input of , after 2x2 Of maxpooling, In fact, the upper left corner 2x2 Take the maximum value of the square :


So this one 4x4 The matrix of 2x2 Of maxpooling It's half the size in a second , This is the same. maxpooling The purpose of .

1.4.3 Relu

Previous introduction sigmoid Function , It was mentioned that it is a kind of activation function , and Relu It is another activation function that is more commonly used in the field of image , Relu comparison sigmoid It's very simple :


In fact, when the number is less than 0 Take... When it's time 0, Greater than 0 When it's the same . It's that simple .

1.4.4 Dropout

I've come here to introduce 3 Operator ,conv2d, maxpooling,relu, Every operation is very simple , however Dropout It's even simpler , There is no calculation , So there is no formula in this part .

The problem of model over fitting has not been mentioned before , Because the neural network model in the training process , It is likely that the model fits the training set provided by itself very well , But once you come across data that you haven't seen , We can't predict the right result at all , That's when there's over fitting .

that , How to solve the over fitting problem ?Dropout It's a very simple and crude way , From the parameters that have been trained , Randomly pick out some and discard them and reset them to 0, That's why its name is Dropout, Just throw away some parameters at random . It's an incredibly simple way , But it's surprisingly easy to use , For example, just in maxpooling Then randomly discard 60% Trained parameters , We can solve the over fitting problem well .

1.4.5 Flatten

It is still the simple style of convolutional neural network , There's no formula here .

Flatten It's just like the literal meaning , Put one 2 Flatten a matrix of dimensions , For example, such a matrix :


It's that simple ...

1.4.6 Dense

Dense In fact, I have already introduced , It's the multiplication of a matrix , Then add :


So the convolution part doesn't really need to know too much math .

1.4.7 Softmax

This is the last operator , For example, we need to do handwriting recognition , So the final output will be 0-9, This will be a 1x10 Matrix , For example, the following prediction results ( It's actually a line , It is written in two lines for the convenience of display ):


above 1x10 The matrix of can see the 7 Number 0.753 Far greater than the others ( We're from 0 Start ), So we can know that the current forecast is 7.  therefore softmax It will be output as the output layer of the model 10 A digital , Each number represents the picture is 0-9 Probability , We take the biggest probability is the prediction result . On the other hand , above 10 The sum of the numbers is exactly 1, So actually each number represents a probability , The model thinks that the number is 1 A probability is 0.000498, yes 2 Is the probability that 0.000027, And so on , The result of such intuitionistic convenience is to use softmax Calculated .


For example, there are two numbers [1, 2] after softmax operation :


The last two numbers are [0.269, 0.731].

In the first part, the operators related to convolutional neural network are finally introduced , The second part will introduce how to use Keras (Tensorflow) Machine learning framework trains a handwriting recognition model , Finally, the third part introduces how to import the generated model into stm32 Run above .

1.5 reference

Stanford classic machine learning video

link :

Linear regression

link :

Back propagation link : Convolution operation link :




Make the development of Internet of things terminals simple 、 Fast , The value of chips is maximized .Apache2.0 agreement , It can be used in commercial products free of charge , No need to publish the source code , No potential business risks .

Long press QR code , Pay attention to our

版权声明:本文为[__ Shirley]所创,转载请带上原文链接,感谢。