The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Abstract: Traditionally, if we want to make a computer work, we give it a bunch of instructions, and then it follows this instruction step by step. There are results, very clear. However, this method does not work in machine learning. Machine learning does not accept the instructions you enter at all. Instead, it accepts the data you entered!

In this article, I will give an overview of machine learning. The purpose of this article is to enable anyone who does not even understand machine learning to understand machine learning and to get started with relevant practices. This document can also be considered as an extra chapter of EasyPR development. From here, we must understand the machine learning in order to further introduce the core of EasyPR. Of course, this article also faces the general readers and will not have relevant prerequisite requirements for reading.

Before entering the topic, I think that there may be a doubt in the reader's mind: what importance is machine learning, so that you want to read this very long article?

I do not directly answer this question before. On the contrary, I would like to invite everyone to look at the two plans.

The following figure is Figure 1:

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 1 Marriage between executives in the machine learning world and predators in the Internet community

The three people on this map are the leaders of today's machine learning community. In the middle is Geoffrey Hinton, a professor at the University of Toronto, Canada, who is now head of the Google Brain. On the right is Yann LeCun, a professor at New York University and now director of the Facebook Artificial Intelligence Lab. The people on the left are familiar with Andrew Ng, whose Chinese name is Wu Enda, an associate professor at Stanford University, and who is now the head of the “Baidu Brain” and Baidu’s chief scientist. These three are the hottest players in the industry. They are being hired by the Internet giants to see their importance. And their research direction is all subclasses of machine learning - deep learning.

The following figure is Figure 2:

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 2 Voice Assistant product

What is depicted on this picture? Cortana, the voice assistant on Windows Phone, was named after the assistant chief of the Sergeant in Halo. Compared to other competitors, Microsoft launched the service very late. What is the core technology behind Cortana and why does it understand human speech? In fact, this technology is exactly machine learning. Machine learning is a key technology for all human voice assistant products (including Apple's siri and Google's Now) to interact with people.

Through the above two figures, I believe we can see that machine learning seems to be a very important technology with many unknown features. Learning it seems like an interesting task. In fact, learning machine learning can not only help us understand the latest trends in the Internet community, but also know the implementation technologies that accompany our convenience services.

text

What is the machine learning and why it can have such a great magic, these questions are exactly what this article answers. At the same time, this article is called "talking from machine learning", so it will introduce all the contents related to machine learning, including disciplines (such as data mining, computer vision, etc.), algorithms (neural networks, svm) and so on. The main contents of this article are as follows:

A story shows what is machine learning

Definition of machine learning

The scope of machine learning

Machine learning methods

Machine Learning Applications - Big Data

Subclasses of Machine Learning - Deep Learning

Machine Learning Parent Class – Artificial Intelligence

Machine Learning Thinking - Computer Subconscious

to sum up

postscript

1. A story explains what is machine learning

The term machine learning is puzzling. First of all, it is the literal translation of the English name Machine Learning (ML for short). In computing, Machine generally refers to computers. The name uses anthropomorphic tricks to show that this technology is a technique that allows the machine to "learn". But how can a computer "learn" when it is dead?

Traditionally, if we want to make the computer work, we give it a bunch of instructions, and then it follows this directive step by step. There are results, very clear. However, this method does not work in machine learning. Machine learning does not accept the instructions you enter at all. Instead, it accepts the data you enter! In other words, machine learning is a way for computers to use data rather than instructions to perform various tasks.

This sounds incredible, but the result is very feasible. The idea of ​​"statistics" will always be accompanied when you learn the concepts related to "machine learning". The notion of relevance, not causation, will be the core concept that supports machine learning to work. You will subvert the fundamental idea of ​​causality that has been established in all your previous programs.

Below I use a story to simply clarify what is machine learning. This story is more suitable for use as a concept of elucidation. Here, this story has not been unfolded, but the relevant content and core exist. If you want to briefly understand what machine learning is, then reading this story is enough. If you want to learn more about machine learning and the contemporary technologies associated with it, then please continue to look down and have more rich content behind.

This example comes from my real life experience. When I was thinking about this problem, I suddenly discovered that its process could be extended to a complete machine learning process. So I decided to use this example as the beginning of all the introduction. This story is called "waiting for someone else's question."

I believe that everyone has to meet someone else and wait for others to experience it. In reality, not everyone is so punctual, so when you come across someone who is late for love, your time is inevitably wasted. I have encountered such an example.

For one of my friend Xiao Y, he is not so punctual. The most common manifestation is that he is often late. When I once had an appointment with him at 3 o'clock to meet someone at McDonald's, the moment I went out I suddenly thought of a question: Is it appropriate for me to start now? Will I spend 30 minutes waiting for him after I arrive? I decided to take a strategy to solve this problem.

There are several ways to solve this problem. The first method is to use knowledge: I search for knowledge that can solve this problem. But unfortunately, no one will pass on the question of how to wait for people as knowledge, so I cannot find the existing knowledge to solve this problem. The second method is to ask others: I ask other people the ability to solve this problem.

But the same, no one can answer this question, because no one may run into the same situation with me. The third method is the normative method: I asked my own heart, have I established any criteria to face this problem? For example, I will arrive on time, regardless of other people. But I am not a rigid person. I have not established such a rule.

In fact, I believe that one method is more suitable than the above three. I relived my experience of meeting with Xiaoyin in my mind to see what proportion of the number of appointments with him was late. And I use this to predict his chances of being late.

If this value exceeds a certain limit in my heart, then I choose to wait for a while before starting. Suppose I'm about 5 times with Xiaoyue and the number of times he's late is 1, then he's 80% on time. The threshold in my mind is 70%. I think this time Little Y shouldn't be late, so I'm on time. Go out.

If Xiao Y occupies 4 times out of 5 times late, that is, he is 20% on time, because this value is lower than my threshold, so I chose to postpone the time for going out. This method is also called empirical method from the perspective of its utilization. In the process of thinking about empirical methods, I actually used all the same data in the past. Therefore, it can also be called judgment based on data.

The judgment based on the data is fundamentally consistent with the idea of ​​machine learning.

In the process of thinking just now, I only considered the attribute of "frequency". In real machine learning, this may not be an application. The general machine learning model considers at least two quantities: one is the dependent variable, which is the result we want to predict, and in this case it is the judgement of whether Little Y is late or not.

The other is the independent variable, which is the amount used to predict whether Y is late. Suppose I took time as an independent variable. For example, I found that all Y's late days were basically Friday, but he was basically not late for non-Friday. So I can build a model to simulate the probability that Little Y is late or whether the day is a Friday. See below:

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 3 Decision tree model

Such a graph is the simplest machine learning model, called a decision tree.

When we consider only one independent variable, the situation is simpler. If we increase our argument by one more. For example, when the young Y is late in part, it is when he drives in (you can think of him as driving a bad smell, or the road is blocked). So I can consider this information in association. Create a more complex model that contains two independent variables and one dependent variable.

Even more complicated, Xiao Y's lateness and weather also have certain reasons. For example, when it rains, I need to consider three independent variables.

If I want to be able to predict the specific time when Y is late, I can establish a model with each time he arrives late with the size of the rain and the previously considered independent variables. So my model can predict the value, for example, he will probably be a few minutes late. This will help me better plan my time out. In this situation, the decision tree is not well supported because the decision tree can only predict discrete values. We can use the linear regression method introduced in Section 2 to build this model.

If I give these computers the process of modeling. For example, input all the independent variables and dependent variables, and let the computer help me to generate a model. At the same time, let the computer give me a suggestion based on my current situation, whether or not I need to delay the door. Then the process of computers performing these auxiliary decisions is the process of machine learning.

The machine learning method is a method in which a computer uses existing data (experience) to obtain a certain model (a pattern of lateness) and uses the model to predict the future (whether it is late).

From the above analysis, it can be seen that machine learning is similar to the empirical process of human thinking, but it can consider more situations and perform more complex calculations. In fact, one of the main purposes of machine learning is to transform the process of human thinking and inducting experience into the process of computer computing and computing the model. Computer-derived models can solve many flexible and complex problems in a human-like manner.

Below, I will begin a formal introduction to machine learning, including definitions, scope, methods, applications, etc., all of which are included.

2. The definition of machine learning

Broadly speaking, machine learning is a method that gives the machine the ability to learn so that it can perform functions that direct programming cannot accomplish. But in a practical sense, machine learning is a method of training a model by using data, and then using model prediction.

Let us look at an example concretely.

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 4 Example of house prices

Take the house of national topics. Now that I have a house in my hand that needs to be sold, what price should I label it? The area of ​​the house is 100 square meters, the price is 1 million, 1.2 million, or 1.4 million?

Obviously, I hope to get some rules of house prices and area. So how do I get this rule? Do you use the average house price data in the newspaper? Or reference other people's area is similar? Either way, it does not seem to be too reliable.

I now hope to get a reasonable and most able to reflect the law of the relationship between area and house prices. So I investigated some houses that were similar to my room type and got a set of data. This set of data includes the size and price of large and small houses. If I can find out the rules of area and price from this set of data, then I can get the price of the house.

The search for the law is very simple, fitting a straight line, letting it “cross” all the points, and the distance from each point is as small as possible.

Through this straight line, I obtained a law that best reflects the law of house prices and areas. This line is also a function of the following formula:

Rates = area* a + b

Both a and b in the above are linear parameters. After obtaining these parameters, I can calculate the price of the house.

Assuming a = 0.75, b = 50, then house prices = 100 * 0.75 + 50 = 1.25 million. This result is different from the 1 million, 1.2 million, and 1.4 million listed in my previous list. Since this line takes into account most of the situations, it is a most reasonable prediction in the sense of “statistics”.

Two pieces of information were revealed during the solution:

1. The house price model is based on the type of function fitted. If it is a straight line, then the fitted linear equation. If it is another type of line, such as a parabola, then the parabolic equation is fitted. There are many algorithms for machine learning. Some powerful algorithms can fit complex nonlinear models to reflect situations that are not expressible by straight lines.

2. If I have more data, the more patterns my model can think about, the better predictive the new situation may be. This is a manifestation of the "data is king" thought in the machine learning world. In general (not absolute), the more data, the better the model predictions generated by the last machine learning.

Through my process of fitting a straight line, we can make a complete review of the machine learning process. First of all, we need to store historical data in the computer. Next, we process these data through machine learning algorithms. This process is called “training” in machine learning. The results of the processing can be used by us to predict new data. This result is generally called “model”. The process of predicting new data is called "prediction" in machine learning. "Training" and "prediction" are two processes of machine learning, "model" is the intermediate output of the process, "training" produces "model", and "model" guides "prediction."

Let us compare the process of machine learning with the process of human induction of historical experience.

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 5 Analogy between machine learning and human thinking

Humanity has accumulated a lot of history and experience in the process of growth and life. Humans regularly "generalize" these experiences and obtain "laws" of life. When humans encounter unknown problems or need to "guess" the future, humans use these "rules" to "guess" unknown problems and the future to guide their lives and work.

The "training" and "prediction" processes in machine learning can correspond to the "induction" and "speculation" processes of humans. Through this correspondence, we can find that the idea of ​​machine learning is not complicated, but merely a simulation of human learning and growth in life. Since machine learning is not based on the result of programming, its processing is not causal logic, but rather a conclusion drawn from inductive thinking.

This can also be a reminder of why humans have to learn history. History is actually a summary of mankind's past experience. There is a saying well. "History is often different, but history is always surprisingly similar." By learning history, we can sum up the laws of life and the country from history and guide our next steps. This is of great value. Some people in modern times have overlooked the original value of history, but have used it as a means of promoting merit. This is actually a misuse of the true value of history.

3. The scope of machine learning

Although the above illustrates what machine learning is, it does not give the scope of machine learning.

In fact, machine learning has a deep connection with pattern recognition, statistical learning, data mining, computer vision, speech recognition, and natural language processing.

In terms of scope, machine learning is similar to pattern recognition, statistical learning, and data mining. At the same time, the combination of machine learning and processing techniques in other fields has formed interdisciplinary subjects such as computer vision, speech recognition, and natural language processing. Therefore, generally speaking, data mining can be equivalent to machine learning.

At the same time, what we usually call machine learning applications should be universal, not only limited to structured data, but also applications such as images and audio.

The introduction of these related fields in machine learning in this section helps us clarify the application scenarios and research scope of machine learning, and better understand the underlying algorithms and application layers.

The following figure shows some of the relevant areas of discipline and research areas involved in machine learning.

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 6 Machine Learning and Related Subjects

Pattern recognition

Pattern recognition = machine learning. The main difference between the two is that the former is a concept developed from the industrial sector, while the latter is mainly derived from the computer science. In the famous "Pattern Recognition And Machine Learning" book, Christopher M. Bishop said at the beginning: "Pattern recognition originates from industry, and machine learning comes from computer science. However, activities in them can be viewed. For the two aspects of the same field, and in the past 10 years, they have all made considerable progress."

Data mining

Data Mining = Machine Learning + Database. The concept of data mining in recent years is really familiar. Almost equal to hype. But all that data mining will boast about how data mining, such as digging out gold from data, and turning waste data into value, and so on. However, although I might mine gold, I may also mine "stone."

The meaning of this statement is that data mining is only a way of thinking, telling us that we should try to dig out the data from the data, but not every data can excavate gold, so do not be mythical. A system will never become omnipotent because of a data mining module (this is what IBM most likes to boast about). On the contrary, a person with data mining thinking is the key, and he must also have profound data The understanding that it is possible to derive from the data model to guide the improvement of the business. Most algorithms in data mining are optimizations of machine learning algorithms in the database.

Statistical learning

Statistical learning is approximately equal to machine learning. Statistical learning is a highly overlapping discipline with machine learning. Because most of the methods in machine learning come from statistics, it is even believed that the development of statistics has promoted the prosperity of machine learning. For example, the well-known support vector machine algorithm is derived from the statistics department.

But to a certain extent, there is a difference between the two. The difference is that statistical learners are focusing on the development and optimization of statistical models and partial mathematics, while machine learners are more concerned with solving problems and practicing more. Machine learning researchers will focus on improving the efficiency and accuracy of learning algorithms performed on computers.

Computer vision

Computer Vision = Image Processing + Machine Learning. Image processing technology is used to process the image as an input into a machine learning model, and machine learning is responsible for identifying the relevant pattern from the image. There are many applications related to computer vision, such as Baidu maps, handwritten character recognition, license plate recognition, and so on. This field is a very promising application, and it is also a popular research direction. With the development of deep learning in the new field of machine learning, the effect of computer image recognition has been greatly promoted. Therefore, the future of computer vision development is immeasurable.

Speech Recognition

Speech recognition = speech processing + machine learning. Speech recognition is the combination of audio processing technology and machine learning. Speech recognition technology is generally not used alone, but it is generally combined with related technologies of natural language processing. Current related applications include Apple's voice assistant siri.

Natural language processing

Natural language processing = text processing + machine learning. Natural language processing technology is mainly an area where the machine understands human language. In the natural language processing technology, a large number of techniques related to the compilation principle are used, such as lexical analysis, grammar analysis, etc. In addition, at the understanding of this level, semantic understanding, machine learning, and the like are used.

As the only symbol created by human beings, natural language processing has always been the research direction of machine learning. According to Yu Kai, a Baidu machine learning expert, “listening and seeing, plainly speaking, both Arab and American dogs, and only language is unique to humans”. How to use machine learning technology to understand natural language has always been the focus of attention in industry and academia.

It can be seen that the machine learning is extended and applied in many fields. The development of machine learning technology has prompted many advancements in the field of intelligence and improved our lives.

4. Machine learning methods

Through the introduction of the previous section we learned about the general scope of machine learning. So how many classic algorithms are there in machine learning? In this section I will briefly introduce the classic representation method in machine learning. The focus of this section is on the intension of these methodologies. The details of mathematics and practice will not be discussed here.

1, the regression algorithm

In most machine learning courses, the regression algorithm is the first one introduced. There are two reasons: First, the regression algorithm is relatively simple, introducing it can make people smoothly migrate from statistics to machine learning. 2. The regression algorithm is the cornerstone behind several powerful algorithms. If you do not understand the regression algorithm, you cannot learn those powerful algorithms. There are two important subclasses of regression algorithms: linear regression and logistic regression.

Linear regression is what we talked about previously. How to fit a straight line to best match all my data? It is generally solved using the "least squares method". The idea of ​​"least squares" is that, assuming that our fitted line represents the true value of the data, the observed data represents the value with the error.

In order to minimize the effect of errors, it is necessary to solve a straight line to minimize the sum of squares of all errors. The least square method converts the optimal problem into the problem of finding the extremum of the function. The mathematical function of the extremum of the function is generally to use the method of finding the derivative 0. However, this approach is not suitable for computers and may not be solved. It may also be too computationally expensive.

The computer science community has a special discipline called "numerical calculations," which are specifically designed to improve the accuracy and efficiency of computers when performing various types of calculations. For example, the famous "gradient descent" and "Newton method" are classical algorithms in numerical calculations and are also very suitable for dealing with the problem of solving function extremes.

The gradient descent method is one of the simplest and most effective methods to solve the regression model. In a strict sense, because of the linear regression factors in both the neural network and the recommendation algorithm, the gradient descent method is also used in the implementation of the following algorithms.

Logistic regression is an algorithm that is very similar to linear regression, but, in essence, the problem type of linear regression processing is inconsistent with logistic regression. Linear regression deals with numerical problems, that is, the final predicted outcome is a number, such as house prices. The logistic regression belongs to the classification algorithm, that is, the logistic regression prediction result is a discrete classification, such as judging whether the mail is spam, and whether the user will click on the advertisement, and so on.

In terms of implementation, logistic regression simply adds a Sigmoid function to the result of the linear regression and converts the numerical result to a probability between 0 and 1 (the image of Sigmoid function is not generally intuitive, you only need to understand The larger the value is, the closer the function gets to 1 and the smaller the value, the closer the function gets to 0). Then we can make a prediction based on this probability. For example, if the probability is greater than 0.5, the email is spam, or if the tumor is malignant. Intuitively speaking, logistic regression is to draw a classification line, see below.

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 7 Intuitive explanation of logistic regression

Suppose we have data for a group of cancer patients. Some of these patients' tumors are benign (blue dots in the figure) and some are malignant (red dots in the figure). Here the red and blue color of the tumor can be called the "tag" of the data. At the same time, each data includes two "features": the age of the patient and the size of the tumor. We map these two features and labels to this two-dimensional space, which forms the data of my chart above.

When I have a green dot, should I judge whether the tumor is malignant or benign? According to the red and blue points we have trained a logistic regression model, which is the classification line in the figure. At this time, according to the green point appears on the left side of the classification line, so we judge its label should be red, which means that it is a malignant tumor.

The classification lines drawn by the logistic regression algorithm are basically linear (there are logistic regressions that draw out nonlinear classification lines, but such models are very inefficient when dealing with large amounts of data), which means when two types of When the boundary between the two is not linear, the ability to express logical regression is insufficient. The following two algorithms are the most powerful and important algorithms in the machine learning world and can all fit nonlinear classification lines.

2, neural network

Neural networks (also known as Artificial Neural Networks, ANN) are very popular algorithms in machine learning in the 1980s, but declined in the mid-1990s. Now, with the "deep learning" trend, the neural network is reinstalling and it has become one of the most powerful machine learning algorithms.

The birth of neural networks originated from the study of the working mechanism of the brain. Early bioscientists used neural networks to simulate the brain. Machine learning scholars use neural networks for machine learning experiments and found that both visual and speech recognition are very effective. After the birth of the BP algorithm (a numerical algorithm for accelerating neural network training), the development of neural networks has entered an upsurge. One of the inventors of the BP algorithm was Geoffrey Hinton, a machine learning geek (intermediary in Figure 1).

Specifically, what is the learning mechanism of neural networks? In simple terms, it is decomposition and integration. In the famous Hubel-Wiesel experiment, scholars studied the visual analysis mechanism of cats.

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 8 Hubel-Wiesel test and visual mechanism of the brain

For example, a square is broken down into four fold lines into the next layer of visual processing. Each of the four neurons processes a polyline. Each polyline breaks down into two straight lines, each of which is broken down into black and white. Thus, a complex image becomes a large number of details into the neuron, after the neurons are processed and then integrated, and finally it is concluded that the square is seen. This is the mechanism of the visual recognition of the brain and the mechanism of the neural network work.

Let us look at the logical architecture of a simple neural network. In this network, it is divided into input layer, hidden layer, and output layer. The input layer is responsible for receiving the signal, the hidden layer is responsible for the decomposition and processing of the data, and the final result is integrated into the output layer. A circle in each layer represents a processing unit, which can be thought of as simulating a neuron. A number of processing units form a layer, and several layers form a network, which is called "neural network."

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 9 The logical architecture of a neural network

In the neural network, each processing unit is actually a logistic regression model. The logistic regression model receives the input of the upper layer and transfers the prediction result of the model as output to the next level. Through this process, neural networks can complete very complex nonlinear classifications.

The following figure will demonstrate a famous application of neural network in the field of image recognition. This program is called LeNet and it is a neural network based on multiple hidden layers. With LeNet, you can recognize a variety of hand-written numbers, and achieve high recognition accuracy and good robustness.

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 10 shows the effect of LeNet

An image of the input computer is displayed in the lower right square, and the computer's output is shown in the red square above the word "answer". The three vertical image columns on the left show the output of the three hidden layers in the neural network. It can be seen that as the layers go deeper, the deeper the level of processing details is, the lower the level of processing, for example, the basic processing of layer 3 is already It is detail of line. LeNet's inventor was Yann LeCun, a machine-learning fan introduced in the previous article (Figure 1, right).

In the 1990s, the development of neural networks entered a bottleneck period. The main reason is that despite the acceleration of the BP algorithm, the neural network training process is still very difficult. Therefore, the support vector machine (SVM) algorithm replaced the neural network in the late 1990s.

3, SVM (Support Vector Machine)

Support Vector Machine (SVM) algorithm is a classical algorithm born in the statistical learning world and at the same time it is shining in the machine learning world.

Support vector machine (SVM) algorithm is a kind of enhancement of logistic regression algorithm in a sense: by giving the logistic regression algorithm more strict optimization conditions, SVM algorithm can obtain better classification boundary than logistic regression. However, if there is no certain kind of function technology, the support vector machine algorithm is at best a better linear classification technique.

However, by combining with the Gaussian "nuclear", the support vector machine can express a very complex classification boundary, so as to achieve a good classification effect. "Nuclear" is actually a special kind of function. The most typical feature is that it can map low-dimensional space to high-dimensional space.

For example, the following figure shows:

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 11 Support Vector Machine Legend

How do we divide a circular boundary in the two-dimensional plane? It may be difficult in a two-dimensional plane, but the "nucleus" can map two-dimensional space to three-dimensional space, and then use a linear plane to achieve a similar effect. In other words, the non-linear classification boundary defined by the two-dimensional plane can be equivalent to the linear classification boundary of the three-dimensional plane. Thus, we can achieve the nonlinear division effect in the two-dimensional plane by simple linear division in three-dimensional space.

The approximate introduction of machine learning enables even a person who does not understand machine learning to understand machine learning.

Figure 12 Three-dimensional space cutting

Support vector machines are machine learning algorithms with a very strong mathematical component (relatively, neural networks have biological science components). In the core steps of the algorithm, there is a step to prove that mapping the data from the low dimension to the high dimension does not bring about an increase in the final computational complexity. Thus, through the support vector machine algorithm, it can not only maintain the computational efficiency, but also obtain a very good classification effect. Therefore, support vector machines (SVMs) occupied the most important position in machine learning in the late 1990s, and they basically replaced neural network algorithms. Until now neural networks have re-emerged through deep learning, and a delicate balance shift has taken place between them.

4, clustering algorithm

One of the prominent features of the previous algorithm is that my training data contains tags, and the trained model can predict tags for other unknown data. In the following algorithm, training data is not tagged, and the algorithm's purpose is to infer the tags of these data through training. This kind of algorithm has a general name, namely, unsupervised algorithm (the algorithms with tagged data are supervised). The most typical representative of unsupervised algorithms is the clustering algorithm.

Let us still take a two-dimensional data, a certain data contains two characteristics. I hope to use a clustering algorithm to tag different types of them. What should I do? In simple terms, the clustering algorithm is to calculate the distance in the population and divide the data into multiple groups according to the distance.

The most typical representation in a clustering algorithm is the K-Means algorithm.

5, dimension reduction algorithm

The dimensionality reduction algorithm is also an unsupervised learning algorithm whose main feature is to reduce the data from high dimensional to low dimensional levels. Here, the dimension actually represents the size of the characteristic quantity of the data. For example, the house price includes four characteristics of the length, width, area, and number of rooms of the house, that is, the data with the dimension of 4 dimensions.

It can be seen that the length and width actually overlap with the information represented by the area, such as area = length × width. Through dimensionality reduction algorithm we can remove redundant information and reduce the features to two features of area and number of rooms, that is, compression from 4D data to 2D. Therefore, we reduced the data from high-dimensional to low-dimensional, which not only facilitates representation, but also accelerates computation.

The reduced dimension in the dimension reduction process just mentioned is a level that is visible to the naked eye, and compression at the same time does not bring about loss of information (because the information is redundant).如果肉眼不可视,或者没有冗余的特征,降维算法也能工作,不过这样会带来一些信息的损失。但是,降维算法可以从数学上证明,从高维压缩到的低维中最大程度地保留了数据的信息。因此,使用降维算法仍然有很多的好处。

降维算法的主要作用是压缩数据与提升机器学习其他算法的效率。通过降维算法,可以将具有几千个特征的数据压缩至若干个特征。另外,降维算法的另一个好处是数据的可视化,例如将5维的数据压缩至2维,然后可以用二维平面来可视。降维算法的主要代表是PCA算法(即主成分分析算法)。

6、推荐算法

推荐算法是目前业界非常火的一种算法,在电商界,如亚马逊,天猫,京东等得到了广泛的运用。推荐算法的主要特征就是可以自动向用户推荐他们最感兴趣的东西,从而增加购买率,提升效益。推荐算法有两个主要的类别:

一类是基于物品内容的推荐,是将与用户购买的内容近似的物品推荐给用户,这样的前提是每个物品都得有若干个标签,因此才可以找出与用户购买物品类似的物品,这样推荐的好处是关联程度较大,但是由于每个物品都需要贴标签,因此工作量较大。

另一类是基于用户相似度的推荐,则是将与目标用户兴趣相同的其他用户购买的东西推荐给目标用户,例如小A历史上买了物品B和C,经过算法分析,发现另一个与小A近似的用户小D购买了物品E,于是将物品E推荐给小A。

两类推荐都有各自的优缺点,在一般的电商应用中,一般是两类混合使用。推荐算法中最有名的算法就是协同过滤算法。

7、其他

除了以上算法之外,机器学习界还有其他的如高斯判别,朴素贝叶斯,决策树等等算法。但是上面列的六个算法是使用最多,影响最广,种类最全的典型。机器学习界的一个特色就是算法众多,发展百花齐放。

下面做一个总结,按照训练的数据有无标签,可以将上面算法分为监督学习算法和无监督学习算法,但推荐算法较为特殊,既不属于监督学习,也不属于非监督学习,是单独的一类。

监督学习算法:

线性回归,逻辑回归,神经网络,SVM

无监督学习算法: 聚类算法,降维算法

特殊算法: 推荐算法

除了这些算法以外,有一些算法的名字在机器学习领域中也经常出现。但他们本身并不算是一个机器学习算法,而是为了解决某个子问题而诞生的。你可以理解他们为以上算法的子算法,用于大幅度提高训练过程。其中的代表有:梯度下降法,主要运用在线型回归,逻辑回归,神经网络,推荐算法中;牛顿法,主要运用在线型回归中;BP算法,主要运用在神经网络中;SMO算法,主要运用在SVM中。

5.机器学习的应用–大数据

说完机器学习的方法,下面要谈一谈机器学习的应用了。无疑,在2010年以前,机器学习的应用在某些特定领域发挥了巨大的作用,如车牌识别,网络攻击防范,手写字符识别等等。但是,从2010年以后,随着大数据概念的兴起,机器学习大量的应用都与大数据高度耦合,几乎可以认为大数据是机器学习应用的最佳场景。

譬如,但凡你能找到的介绍大数据魔力的文章,都会说大数据如何准确准确预测到了某些事。例如经典的Google利用大数据预测了H1N1在美国某小镇的爆发。

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图13 Google成功预测H1N1

百度预测2014年世界杯,从淘汰赛到决赛全部预测正确。

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图14 百度世界杯成功预测了所有比赛结果

这些实在太神奇了,那么究竟是什么原因导致大数据具有这些魔力的呢?简单来说,就是机器学习技术。正是基于机器学习技术的应用,数据才能发挥其魔力。

大数据的核心是利用数据的价值,机器学习是利用数据价值的关键技术,对于大数据而言,机器学习是不可或缺的。相反,对于机器学习而言,越多的数据会越可能提升模型的精确性,同时,复杂的机器学习算法的计算时间也迫切需要分布式计算与内存计算这样的关键技术。因此,机器学习的兴盛也离不开大数据的帮助。 大数据与机器学习两者是互相促进,相依相存的关系。

机器学习与大数据紧密联系。但是,必须清醒的认识到,大数据并不等同于机器学习,同理,机器学习也不等同于大数据。大数据中包含有分布式计算,内存数据库,多维分析等等多种技术。单从分析方法来看,大数据也包含以下四种分析方法:

大数据,小分析:即数据仓库领域的OLAP分析思路,也就是多维分析思想。

大数据,大分析:这个代表的就是数据挖掘与机器学习分析法。

流式分析:这个主要指的是事件驱动架构。

查询分析:经典代表是NoSQL数据库。

也就是说,机器学习仅仅是大数据分析中的一种而已。尽管机器学习的一些结果具有很大的魔力,在某种场合下是大数据价值最好的说明。但这并不代表机器学习是大数据下的唯一的分析方法。

机器学习与大数据的结合产生了巨大的价值。基于机器学习技术的发展,数据能够“预测”。对人类而言,积累的经验越丰富,阅历也广泛,对未来的判断越准确。例如常说的“经验丰富”的人比“初出茅庐”的小伙子更有工作上的优势,就在于经验丰富的人获得的规律比他人更准确。而在机器学习领域,根据著名的一个实验,有效的证实了机器学习界一个理论:即机器学习模型的数据越多,机器学习的预测的效率就越好。 See below:

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图15 机器学习准确率与数据的关系

通过这张图可以看出,各种不同算法在输入的数据量达到一定级数后,都有相近的高准确度。于是诞生了机器学习界的名言:成功的机器学习应用不是拥有最好的算法,而是拥有最多的数据!

在大数据的时代,有好多优势促使机器学习能够应用更广泛。例如随着物联网和移动设备的发展,我们拥有的数据越来越多,种类也包括图片、文本、视频等非结构化数据,这使得机器学习模型可以获得越来越多的数据。同时大数据技术中的分布式计算Map-Reduce使得机器学习的速度越来越快,可以更方便的使用。种种优势使得在大数据时代,机器学习的优势可以得到最佳的发挥。

6.机器学习的子类–深度学习

近来,机器学习的发展产生了一个新的方向,即“深度学习”。

虽然深度学习这四字听起来颇为高大上,但其理念却非常简单,就是传统的神经网络发展到了多隐藏层的情况。

在上文介绍过,自从90年代以后,神经网络已经消寂了一段时间。但是BP算法的发明人Geoffrey Hinton一直没有放弃对神经网络的研究。由于神经网络在隐藏层扩大到两个以上,其训练速度就会非常慢,因此实用性一直低于支持向量机。2006年,Geoffrey Hinton在科学杂志《Science》上发表了一篇文章,论证了两个观点:

多隐层的神经网络具有优异的特征学习能力,学习得到的特征对数据有更本质的刻画,从而有利于可视化或分类;

深度神经网络在训练上的难度,可以通过“逐层初始化” 来有效克服。

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图16 Geoffrey Hinton与他的学生在Science上发表文章

通过这样的发现,不仅解决了神经网络在计算上的难度,同时也说明了深层神经网络在学习上的优异性。从此,神经网络重新成为了机器学习界中的主流强大学习技术。

同时,具有多个隐藏层的神经网络被称为深度神经网络,基于深度神经网络的学习研究称之为深度学习。

由于深度学习的重要性质,在各方面都取得极大的关注,按照时间轴排序,有以下四个标志性事件值得一说:

2012年6月,《纽约时报》披露了Google Brain项目,这个项目是由Andrew Ng和Map-Reduce发明人Jeff Dean共同主导,用16000个CPU Core的并行计算平台训练一种称为“深层神经网络”的机器学习模型,在语音识别和图像识别等领域获得了巨大的成功。Andrew Ng就是文章开始所介绍的机器学习的大牛(图1中左者)。

2012年11月,微软在中国天津的一次活动上公开演示了一个全自动的同声传译系统,讲演者用英文演讲,后台的计算机一气呵成自动完成语音识别、英中机器翻译,以及中文语音合成,效果非常流畅,其中支撑的关键技术是深度学习;

2013年1月,在百度的年会上,创始人兼CEO李彦宏高调宣布要成立百度研究院,其中第一个重点方向就是深度学习,并为此而成立深度学习研究院(IDL)。

2013年4月,《麻省理工学院技术评论》杂志将深度学习列为2013年十大突破性技术(Breakthrough Technology)之首。

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图17 深度学习的发展热潮

文章开头所列的三位机器学习的大牛,不仅都是机器学习界的专家,更是深度学习研究领域的先驱。因此,使他们担任各个大型互联网公司技术掌舵者的原因不仅在于他们的技术实力,更在于他们研究的领域是前景无限的深度学习技术。

目前业界许多的图像识别技术与语音识别技术的进步都源于深度学习的发展,除了本文开头所提的Cortana等语音助手,还包括一些图像识别应用,其中典型的代表就是下图的百度识图功能。

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图18 百度识图

深度学习属于机器学习的子类。基于深度学习的发展极大的促进了机器学习的地位提高,更进一步地,推动了业界对机器学习父类人工智能梦想的再次重视。

7.机器学习的父类–人工智能

人工智能是机器学习的父类。深度学习则是机器学习的子类。如果把三者的关系用图来表明的话,则是下图:

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图19 深度学习、机器学习、人工智能三者关系

毫无疑问,人工智能(AI)是人类所能想象的科技界最突破性的发明了,某种意义上来说,人工智能就像游戏最终幻想的名字一样,是人类对于科技界的最终梦想。从50年代提出人工智能的理念以后,科技界,产业界不断在探索,研究。

这段时间各种小说、电影都在以各种方式展现对于人工智能的想象。人类可以发明类似于人类的机器,这是多么伟大的一种理念!但事实上,自从50年代以后,人工智能的发展就磕磕碰碰,未有见到足够震撼的科学技术的进步。

总结起来,人工智能的发展经历了如下若干阶段,从早期的逻辑推理,到中期的专家系统,这些科研进步确实使我们离机器的智能有点接近了,但还有一大段距离。直到机器学习诞生以后,人工智能界感觉终于找对了方向。基于机器学习的图像识别和语音识别在某些垂直领域达到了跟人相媲美的程度。机器学习使人类第一次如此接近人工智能的梦想。

事实上,如果我们把人工智能相关的技术以及其他业界的技术做一个类比,就可以发现机器学习在人工智能中的重要地位不是没有理由的。

人类区别于其他物体,植物,动物的最主要区别,作者认为是“智慧”。而智慧的最佳体现是什么?

是计算能力么,应该不是,心算速度快的人我们一般称之为天才。

是反应能力么,也不是,反应快的人我们称之为灵敏。

是记忆能力么,也不是,记忆好的人我们一般称之为过目不忘。

是推理能力么,这样的人我也许会称他智力很高,类似“福尔摩斯”,但不会称他拥有智慧。

是知识能力么,这样的人我们称之为博闻广,也不会称他拥有智慧。

想想看我们一般形容谁有大智慧?圣人,诸如庄子,老子等。智慧是对生活的感悟,是对人生的积淀与思考,这与我们机器学习的思想何其相似?通过经验获取规律,指导人生与未来。没有经验就没有智慧。

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图20 机器学习与智慧

那么,从计算机来看,以上的种种能力都有种种技术去应对。

例如计算能力我们有分布式计算,反应能力我们有事件驱动架构,检索能力我们有搜索引擎,知识存储能力我们有数据仓库,逻辑推理能力我们有专家系统,但是,唯有对应智慧中最显著特征的归纳与感悟能力,只有机器学习与之对应。这也是机器学习能力最能表征智慧的根本原因。

让我们再看一下机器人的制造,在我们具有了强大的计算,海量的存储,快速的检索,迅速的反应,优秀的逻辑推理后我们如果再配合上一个强大的智慧大脑,一个真正意义上的人工智能也许就会诞生,这也是为什么说在机器学习快速发展的现在,人工智能可能不再是梦想的原因。

人工智能的发展可能不仅取决于机器学习,更取决于前面所介绍的深度学习,深度学习技术由于深度模拟了人类大脑的构成,在视觉识别与语音识别上显著性的突破了原有机器学习技术的界限,因此极有可能是真正实现人工智能梦想的关键技术。无论是谷歌大脑还是百度大脑,都是通过海量层次的深度学习网络所构成的。也许借助于深度学习技术,在不远的将来,一个具有人类智能的计算机真的有可能实现。

最后再说一下题外话,由于人工智能借助于深度学习技术的快速发展,已经在某些地方引起了传统技术界达人的担忧。真实世界的“钢铁侠”,特斯拉CEO马斯克就是其中之一。最近马斯克在参加MIT讨论会时,就表达了对于人工智能的担忧。“人工智能的研究就类似于召唤恶魔,我们必须在某些地方加强注意。”

机器学习大概的介绍让即便完全不了解机器学习的人也能了解机器学习

图21 马斯克与人工智能

尽管马斯克的担心有些危言耸听,但是马斯克的推理不无道理。“如果人工智能想要消除垃圾邮件的话,可能它最后的决定就是消灭人类。”马斯克认为预防此类现象的方法是引入政府的监管。

在这里作者的观点与马斯克类似,在人工智能诞生之初就给其加上若干规则限制可能有效,也就是不应该使用单纯的机器学习,而应该是机器学习与规则引擎等系统的综合能够较好的解决这类问题。因为如果学习没有限制,极有可能进入某个误区,必须要加上某些引导。正如人类社会中,法律就是一个最好的规则,杀人者死就是对于人类在探索提高生产力时不可逾越的界限。

在这里,必须提一下这里的规则与机器学习引出的规律的不同,规律不是一个严格意义的准则,其代表的更多是概率上的指导,而规则则是神圣不可侵犯,不可修改的。规律可以调整,但规则是不能改变的。有效的结合规律与规则的特点,可以引导出一个合理的,可控的学习型人工智能。

8.机器学习的思考–计算机的潜意识

最后,作者想谈一谈关于机器学习的一些思考。主要是作者在日常生活总结出来的一些感悟。

回想一下我在节1里所说的故事,我把小Y过往跟我相约的经历做了一个罗列。但是这种罗列以往所有经历的方法只有少数人会这么做,大部分的人采用的是更直接的方法,即利用直觉。那么,直觉是什么?

其实直觉也是你在潜意识状态下思考经验后得出的规律。就像你通过机器学习算法,得到了一个模型,那么你下次只要直接使

All In One Gaming PC

All In One Gaming PC is the best choice when you are looking higher level desktop type computer for heaver tasks, like engineering or architecture drawings, designing, 3D max, video or music editing, etc. What we do is Custom All In One Gaming PC, you can see Colorful All In One Gaming PC at this store, but the most welcome is white and black. Sometimes, clients may ask which is the Best All In One Gaming PC? 23.8 inch i7 or i5 11th generation All In One Gaming Desktop PC should beat it. Cause this configuration can finish more than 80% task for heavier jobs. Business All In One Computer and All In One Desktop Touch Screen is other two popular series. Multiple screens, cpu, storage optional. Except All In One PC, there are Education Laptop, Gaming Laptop, i7 16gb ram 4gb graphics laptop, Mini PC , all in one, etc.

Any other unique requirements, Just feel free contact us and share your idea, thus more details sended in 1-2working days. Will try our best to support you.

All In One Gaming PC,Best All In One Gaming Pc,Colorful All In One Gaming Pc,Custom All In One Gaming Pc,All In One Gaming Desktop Pc

Henan Shuyi Electronics Co., Ltd. , https://www.shuyicustomlaptop.com

Posted on