史上最全!多图带你读懂各种常见卷积类型

人工智能 专栏收录该内容
87 篇文章 0 订阅

史上最全!多图带你读懂各种常见卷积类型

If you’ve heard of different kinds of convolutions in Deep Learning (e.g. 2D / 3D / 1x1 / Transposed / Dilated (Atrous) / Spatially Separable / Depthwise Separable / Flattened / Grouped / Shuffled Grouped Convolution), and got confused what they actually mean, this article is written for you to understand how they actually work.

Here in this article, I summarize several types of convolution commonly used in Deep Learning, and try to explain them in a way that is accessible for everyone. Besides this article, there are several good articles from others on this topic. Please check them out (listed in the Reference).

Hope this article could help you to build up intuition and serve as a useful reference for your studies / research. Please feel free to leave comments and suggestions. Thanks and enjoy! :)

The content of this article includes:

  1. Convolution v.s. Cross-correlation

  2. Convolution in Deep Learning (single channel version, multi-channel version)

  3. 3D Convolution

  4. 1 x 1 Convolution

  5. Convolution Arithmetic

  6. Transposed Convolution (Deconvolution, checkerboard artifacts)

  7. Dilated Convolution (Atrous Convolution)

  8. Separable Convolution (Spatially Separable Convolution, Depthwise Convolution)

  9. Flattened Convolution

  10. Grouped Convolution

  11. Shuffled Grouped Convolution

  12. Pointwise Grouped Convolution

如果你在深度学习中听说过不同类型的卷积(例如2d/3d/1x1/转置卷积/空洞卷积(ATROUS)/深度可分离卷积/深度卷积/扁平卷积/分组卷积/随机分组卷积),并且对它们的实际含义感到困惑,那么本文就是为了让你理解它们是如何工作的。 

在本文中,我总结了在深度学习中常用的几种卷积类型,并尝试以一种人人都能理解的方式来解释它们。除了这篇文章之外,还有其他一些关于这个主题的好文章。请查阅它们(在参考文献中列出)。

希望本文能帮助您建立直觉,为您的研究提供有用的参考。请随时发表意见和建议。谢谢,尽情享受!) 

本文的内容包括:

  1. 卷积和互相关
  2. 在深度学习中的卷积(单通道,多通道)
  3. 3D卷积
  4. 1x1卷积
  5. 卷积算法
  6. 转置卷积(反卷积,棋盘效应)
  7. 空洞卷积(带孔卷机核)
  8. 可分离卷积(深度可分离卷积,深度卷积)
  9. 扁平卷积
  10. 分组卷积
  11. 随机分组卷积
  12. 点组卷积
by ceroo 2
02

1. Convolution v.s. Cross-correlation

Convolution is a widely used technique in signal processing, image processing, and other engineering / science fields. In Deep Learning, a kind of model architecture, Convolutional Neural Network (CNN), is named after this technique. However, convolution in deep learning is essentially the cross-correlation in signal / image processing. There is a subtle difference between these two operations.

Without diving too deep into details, here is the difference. In signal / image processing, convolution is defined as:

It is defined as the integral of the product of the two functions after one is reversed and shifted. The following visualization demonstrated the idea.

Convolution in signal processing. The filter g is reversed, and then slides along the horizontal axis. For every position, we calculate the area of the intersection between f and reversed g. The intersection area is the convolution value at that specific position. Image is adopted and edited from this link.

Here, function g is the filter. It’s reversed, and then slides along the horizontal axis. For every position, we calculate the area of the intersection between fand reversed g. That intersection area is the convolution value at that specific position.

On the other hand, cross-correlation is known as sliding dot product or sliding inner-product of two functions. The filter in cross-correlation is not reversed. It directly slides through the function f. The intersection area between f and gis the cross-correlation. The plot below demonstrates the difference between correlation and cross-correlation.

Difference between convolution and cross-correlation in signal processing. Image is adopted and edited from Wikipedia.

In Deep Learning, the filters in convolution are not reversed. Rigorously speaking, it’s cross-correlation. We essentially perform element-wise multiplication and addition. But it’s a convention to just call it convolution in deep learning. It is fine because the weights of filters are learned during training. If the reversed function g in the example above is the right function, then after training the learned filter would look like the reversed function g. Thus, there is no need to reverse the filter first before training as in true convolution.

1.卷积和互相关

卷积是一种广泛应用于信号处理、图像处理和其他工程/科学领域的技术。在深度学习中,一种模型结构:卷积神经网络(CNN)就是以这种技术命名的。然而,深度学习中的卷积本质上是信号/图像处理中的互相关。这两种操作之间有细微的区别。

在不深入细节的情况下,这里有不同之处。在信号/图像处理中,卷积定义为:


它被定义为两个函数的积的积分,其中一个函数在倒转和移位之后。下面的可视化演示了这个想法。


信号处理中的卷积。滤波器g反转,然后沿水平轴滑动。对于每个位置,我们计算F和反向G之间的交集面积。交集面积是特定位置的卷积值。图片来源于此链接

这里,函数g是滤波器。它是反向的,然后沿着水平轴滑动。对于每个位置,我们计算f和反向g之间的交叉区域。该交叉区域是该特定位置的卷积值。

另一方面,互相关又称滑动点积或滑动内积两个函数。互相关中的滤波器没有反转。它直接通过函数f滑动,F与G的交集区域为互相关。下面的图显示了相关性和交叉相关性之间的区别。


信号处理中卷积和互相关的区别。图片来源于维基百科

在深度学习中,卷积中的滤波器是不可逆的。严格地说,它是相互关联的。我们基本上执行元素乘法和加法。但是,在深度学习中称之为卷积是一种惯例。这很好,因为过滤器的权重是在训练过程中学会的。如果上面例子中的倒转函数g是正确的函数,那么在训练后,学习的滤波器看起来像倒转函数g。因此,在训练前不需要像真正的卷积那样先倒转滤波器。



by ceroo 2
03

2. Convolution in Deep Learning

The purpose of doing convolution is to extract useful features from the input. In image processing, there is a wide range of different filters one could choose for convolution. Each type of filters helps to extract different aspects or features from the input image, e.g. horizontal / vertical / diagonal edges. Similarly, in Convolutional Neural Network, different features are extracted through convolution using filters whose weights are automatically learned during training. All these extracted features then are ‘combined’ to make decisions.

There are a few advantages of doing convolution, such as weights sharing and translation invariant. Convolution also takes spatial relationship of pixels into considerations. These could be very helpful especially in many computer vision tasks, since those tasks often involve identifying objects where certain components have certain spatially relationship with other components (e.g. a dog’s body usually links to a head, four legs, and a tail).

2.1. Convolution: the single channel version


1_hKkrLnzObzGtn7oeV4QRmA.gif

Convolution for a single channel. Image is adopted from this link.

In Deep Learning, convolution is the element-wise multiplication and addition. For an image with 1 channel, the convolution is demonstrated in the figure below. Here the filter is a 3 x 3 matrix with element [[0, 1, 2], [2, 2, 0], [0, 1, 2]]. The filter is sliding through the input. At each position, it’s doing element-wise multiplication and addition. Each sliding position ends up with one number. The final output is then a 3 x 3 matrix. (Notice that stride = 1 and padding = 0 in this example. These concepts will be described in the section of arithmetic below.

2.深度学习中的卷积

卷积的目的是从输入中提取有用的特征。在图像处理中,卷积可以选择多种不同的滤波器。每种类型的过滤器都有助于从输入图像中提取不同的方面或特征,例如横向/纵向/对角线边缘。同样,在卷积神经网络中,通过卷积,利用训练过程中自动学习权值的滤波器来提取不同的特征。然后,所有这些提取的特征被“组合”以做出决定。

卷积有几个优点,如权值共享和平移不变性。卷积还考虑了像素的空间关系。这些任务可能非常有用,尤其是在许多计算机视觉任务中,因为这些任务通常涉及识别某些组件与其他组件在空间上有一定关系的对象(例如,狗的身体通常与头部、四条腿和尾巴相连)。

2.1.卷积:单通道版本


单通道卷积。图像来源于此链接

在深度学习中,卷积是元素的乘法和加法。对于具有1个通道的图像,卷积如下图所示。这里的过滤器是一个3 x 3矩阵,元素为[[0,1,2],[2,2,0],[0,1,2]]。滤波器在输入端滑动。在每一个位置上,它都进行元素乘法和加法。每个滑动位置以一个数字结束。最后的输出是一个3×3的矩阵。(注意,在本例中,步幅=1,填充=0。这些概念将在下面的算术部分中描述。

by ceroo 2
04

2.2. Convolution: the multi-channel version

In many applications, we are dealing with images with multiple channels. A typical example is the RGB image. Each RGB channel emphasizes different aspects of the original image, as illustrated in the following image.

Different channels emphasize different aspects of the raw image. The image was taken at Yuanyang, Yunnan, China.

Another example of multi-channel data is the layers in Convolutional Neural Network. A convolutional-net layer usually consists of multiple channels (typically hundreds of channels). Each channel describes different aspects of the previous layer. How do we make transition between layers with different depth? How do we transform a layer with depth n to the following layer with depth m?

Before describing the process, we would like to clarify a few terminologies: layers, channels, feature maps, filters, and kernels. From a hierarchical point of view, the concepts of layers and filters are at the same level, while channels and kernels are at one level below. Channels and feature maps are the same thing. A layer could have multiple channels (or feature maps): an input layer has 3 channels if the inputs are RGB images. “channel” is usually used to describe the structure of a “layer”. Similarly, “kernel” is used to describe the structure of a “filter”.


Difference between “layer” (“filter”) and “channel” (“kernel”).

The difference between filter and kernel is a bit tricky. Sometimes, they are used interchangeably, which could create confusions. Essentially, these two terms have subtle difference. A “Kernel” refers to a 2D array of weights. The term “filter” is for 3D structures of multiple kernels stacked together. For a 2D filter, filter is same as kernel. But for a 3D filter and most convolutions in deep learning, a filter is a collection of kernels. Each kernel is unique, emphasizing different aspects of the input channel.

With these concepts, the multi-channel convolution goes as the following. Each kernel is applied onto an input channel of the previous layer to generate one output channel. This is a kernel-wise process. We repeat such process for all kernels to generate multiple channels. Each of these channels are then summed together to form one single output channel. The following illustration should make the process clearer.

2.2.卷积:多通道版本

在许多应用程序中,我们处理的是具有多个通道的图像。一个典型的例子是RGB图像。每个RGB通道强调原始图像的不同方面,如下图所示。


不同的渠道强调原始图像的不同方面。这张照片摄于中国云南省元阳市。

多通道数据的另一个例子是卷积神经网络中的层。卷积网络层通常由多个信道(通常是数百个信道)组成。每个通道描述前一层的不同方面。我们如何在不同深度的层之间进行转换?如何将深度为n的层转换为深度为m的层?

在描述这个过程之前,我们想澄清一些术语:层、通道、特征图、滤波器和核。从层次结构的角度来看,层和滤波器的概念处于同一层次,而通道和核则处于下面的一个层次。通道和特征图是一样的。一个图层可以有多个通道(或特征图):如果输入是RGB图像,则输入图层有3个通道。“通道”通常用来描述“层”的结构。类似地,“核”用于描述“滤波器”的结构。


“层”(“滤波器”)和“通道”(“核”)之间的区别。

滤波器和核之间的区别有点棘手。有时,它们可以互换使用,这会造成混乱。本质上,这两个术语有细微的区别。“核”是指一个二维权重数组。术语“滤波器”是指堆叠在一起的多个核的三维结构。对于二维滤波器,滤波和核是一样的。但是对于一个3D滤波器和深度学习中的大多数卷积来说,滤波器是核的集合。每个核都是唯一的,强调输入通道的不同方面。

有了这些概念,多通道卷积就如下所示。每个核被应用到前一层的输入通道上,以生成一个输出通道。这是一个核扩展的过程。我们对所有核重复这样的过程,以生成多个通道。然后将这些通道中的每一个相加,形成一个单独的输出通道。下图过程应该更清晰。

by ceroo 2
05

Here the input layer is a 5 x 5 x 3 matrix, with 3 channels. The filter is a 3 x 3 x 3 matrix. First, each of the kernels in the filter are applied to three channels in the input layer, separately. Three convolutions are performed, which result in 3 channels with size 3 x 3.


1_Emy_ai48XaOeGDgykLypPg.gif

The first step of 2D convolution for multi-channels: each of the kernels in the filter are applied to three channels in the input layer, separately. The image is adopted from this link.

Then these three channels are summed together (element-wise addition) to form one single channel (3 x 3 x 1). This channel is the result of convolution of the input layer (5 x 5 x 3 matrix) using a filter (3 x 3 x 3 matrix).


1_5otecXBNlms3lslqlYworA.gif

The second step of 2D convolution for multi-channels: then these three channels are summed together (element-wise addition) to form one single channel. The image is adopted from this link.

Equivalently, we can think of this process as sliding a 3D filter matrix through the input layer. Notice that the input layer and the filter have the same depth (channel number = kernel number). The 3D filter moves only in 2-direction, height & width of the image (That’s why such operation is called as 2D convolution although a 3D filter is used to process 3D volumetric data). At each sliding position, we perform element-wise multiplication and addition, which results in a single number. In the example shown below, the sliding is performed at 5 positions horizontally and 5 positions vertically. Overall, we get a single output channel.


Another way to think about 2D convolution: thinking of the process as sliding a 3D filter matrix through the input layer. Notice that the input layer and the filter have the same depth (channel number = kernel number). The 3D filter moves only in 2-direction, height & width of the image (That’s why such operation is called as 2D convolution although a 3D filter is used to process 3D volumetric data). The output is a one-layer matrix.

Now we can see how one can make transitions between layers with different depth. Let’s say the input layer has Din channels, and we want the output layer has Dout channels. What we need to do is to just apply Dout filters to the input layer. Each filter has Din kernels. Each filter provides one output channel. After applying Dout filters, we have Dout channels, which can then be stacked together to form the output layer.


Standard 2D convolution. Mapping one layer with depth Din to another layer with depth Dout, by using Dout filters.

这里输入层是一个5 x 5 x 3矩阵,有3个通道。滤波器是一个3 x 3 x 3矩阵。首先,滤波器中的每个核分别应用于输入层中的三个通道。进行三次卷积,产生3个尺寸为3 x 3的通道。


多通道二维卷积的第一步:滤波器中的每个核分别应用于输入层中的三个通道。图像来源于此链接

然后将这三个通道相加(元素相加),形成一个单通道(3 x 3 x 1)。该通道是使用滤波器(3 x 3 x 3矩阵)对输入层(5 x 5 x 3矩阵)进行卷积的结果。


多通道二维卷积的第二步:然后将这三个通道相加(元素相加),形成一个单通道。图像来源于此链接

同样,我们可以把这个过程看作是在输入层中滑动一个三维过滤矩阵。请注意,输入层和滤波器具有相同的深度(通道数=核数)。3D滤波器仅在图像的两个方向、高度和宽度上移动(这就是为什么这种操作称为二维卷积,尽管3D滤波器用于处理三维体积数据)。在每个滑动位置,我们执行元素相乘和相加,结果是一个数字。在下面的示例中,滑动在5个水平位置和5个垂直位置执行。总的来说,我们得到一个单一的输出通道。



另一种考虑二维卷积的方法是:把这个过程看作是在输入层中滑动一个三维过滤矩阵。请注意,输入层和滤波器具有相同的深度(通道数=核数)。3D滤波器仅在图像的两个方向、高度和宽度上移动(这就是为什么这种操作称为二维卷积,尽管3D滤波器用于处理三维体积数据)。输出是一个单层矩阵。

现在我们可以看到如何在不同深度的层之间进行过渡。假设输入层有Din通道,我们希望输出层有Dout通道。我们需要做的是将Dout滤波器应用到输入层。每个滤波器都有Din个核。每个滤波器提供一个输出通道。在应用了Dout滤波器之后,我们就有了Dout通道,这些通道可以叠加在一起形成输出层。


标准二维卷积。使用Dout滤波器将一个具有深度Din的层映射到另一个具有深度Dout的层。

by ceroo 2
06

3. 3D Convolution

In the last illustration of the previous section, we see that we were actually perform convolution to a 3D volume. But typically, we still call that operation as 2D convolution in Deep Learning. It’s a 2D convolution on a 3D volumetric data. The filter depth is same as the input layer depth. The 3D filter moves only in 2-direction (height & width of the image). The output of such operation is a 2D image (with 1 channel only).

Naturally, there are 3D convolutions. They are the generalization of the 2D convolution. Here in 3D convolution, the filter depth is smaller than the input layer depth (kernel size < channel size). As a result, the 3D filter can move in all 3-direction (height, width, channel of the image). At each position, the element-wise multiplication and addition provide one number. Since the filter slides through a 3D space, the output numbers are arranged in a 3D space as well. The output is then a 3D data.


In 3D convolution, a 3D filter can move in all 3-direction (height, width, channel of the image). At each position, the element-wise multiplication and addition provide one number. Since the filter slides through a 3D space, the output numbers are arranged in a 3D space as well. The output is then a 3D data.

Similar as 2D convolutions which encode spatial relationships of objects in a 2D domain, 3D convolutions can describe the spatial relationships of objects in the 3D space. Such 3D relationship is important for some applications, such as in 3D segmentations / reconstructions of biomedical imagining, e.g. CT and MRI where objects such as blood vessels meander around in the 3D space.

3.3D卷积

在上一节的最后一个插图中,我们看到我们实际上对一个3D执行卷积。但通常,我们仍然称这种操作为深度学习中的二维卷积。这是对三维体积数据的二维卷积。滤波深度与输入层深度相同。3D滤波器仅沿两个方向移动(图像的高度和宽度)。这种操作的输出是一个二维图像(只有一个通道)。

当然,还有三维卷积。它们是二维卷积的推广。在三维卷积中,滤波深度小于输入层深度(核大小<通道大小)。因此,3D滤波器可以在所有3个方向(图像的高度、宽度、通道)移动。在每个位置,元素的乘法和加法都提供一个数字。由于滤波器在三维空间中滑动,因此输出数字也排列在三维空间中。然后输出三维数据。


在3D卷积中,3D滤波器可以在所有3个方向(图像的高度、宽度、通道)移动。在每个位置,元素的乘法和加法都提供一个数字。由于滤波器在三维空间中滑动,因此输出数字也排列在三维空间中。然后输出三维数据。

与二维卷积编码二维域中物体的空间关系类似,三维卷积可以描述三维空间中物体的空间关系。这种三维关系对于某些应用很重要,例如在生物医学成像的三维分割/重建中,如CT和MRI,其中血管等物体在三维空间中四处游荡。

by ceroo 2
07

4. 1 x 1 Convolution

Since we talked about depth-wise operation in the previous section of 3D convolution, let’s look at another interesting operation, 1 x 1 convolution.

You may wonder why this is helpful. Do we just multiply a number to every number in the input layer? Yes and No. The operation is trivial for layers with only one channel. There, we multiply every element by a number.

Things become interesting if the input layer has multiple channels. The following picture illustrates how 1 x 1 convolution works for an input layer with dimension H x W x D. After 1 x 1 convolution with filter size 1 x 1 x D, the output channel is with dimension H x W x 1. If we apply N such 1 x 1 convolutions and then concatenate results together, we could have a output layer with dimension H x W x N.


1 x 1 convolution, where the filter size is 1 x 1 x D.

Initially, 1 x 1 convolutions were proposed in the Network-in-network paper. They were then highly used in the Google Inception paper. A few advantages of 1 x 1 convolutions are:

  • Dimensionality reduction for efficient computations

  • Efficient low dimensional embedding, or feature pooling

  • Applying nonlinearity again after convolution

The first two advantages can be observed in the image above. After 1 x 1 convolution, we significantly reduce the dimension depth-wise. Say if the original input has 200 channels, the 1 x 1 convolution will embed these channels (features) into a single channel. The third advantage comes in as after the 1 x 1 convolution, non-linear activation such as ReLU can be added. The non-linearity allows the network to learn more complex function.

These advantages were described in Google’s Inception paper as:

“One big problem with the above modules, at least in this naïve form, is that even a modest number of 5x5 convolutions can be prohibitively expensive on top of a convolutional layer with a large number of filters.
This leads to the second idea of the proposed architecture: judiciously applying dimension reductions and projections wherever the computational requirements would increase too much otherwise. This is based on the success of embeddings: even low dimensional embeddings might contain a lot of information about a relatively large image patch… That is, 1 x 1 convolutions are used to compute reductions before the expensive 3 x 3 and 5 x 5 convolutions. Besides being used as reductions, they also include the use of rectified linear activation which makes them dual-purpose.”

One interesting perspective regarding 1 x 1 convolution comes from Yann LeCun “In Convolutional Nets, there is no such thing as “fully-connected layers”. There are only convolution layers with 1x1 convolution kernels and a full connection table.”

4.1x1卷积

既然我们在前面的3d卷积部分讨论了深度操作,那么让我们来看另一个有趣的操作:1x1卷积。

你可能想知道为什么这是有帮助的。我们只是把一个数字乘以输入层中的每个数字吗?对于只有一个通道的层来说,操作很简单。在这里,我们将每个元素乘以一个数字。

如果输入层有多个通道,事情就会变得有趣。下图说明了1 x 1卷积如何用于尺寸为H x W x D的输入层。在1 x 1卷积(滤波器尺寸为1 x 1 x D)之后,输出通道的尺寸为H x W x 1。如果我们应用n个这样的1×1卷积,然后将结果连接在一起,我们就可以得到一个尺寸为h×w×n的输出层。


1 x 1卷积,其中滤波器尺寸为1 x 1 x D。

最初,在Network-in-network论文中提出了1×1卷积。然后,他们在谷歌的inception论文中被广泛使用。1 x 1卷积的几个优点是:

  • 有效减少维度

有效低维嵌入

  • 卷积后再应用非线性

    上图中可以看到前两个优点。经过1×1的卷积,我们显著地减小了尺寸深度。假设原始输入有200个通道,那么1x1卷积将把这些通道(特性)嵌入到单个通道中。第三个优点是在1×1卷积之后,可以添加非线性激活,如relu。非线性允许网络学习更复杂的功能。

    这些优势在Google的inception论文中描述为:

    “上述模块的一个大问题,至少在这种简单的形式下,就是即使是少量的5x5卷积,在具有大量滤波器的卷积层的顶部也会非常昂贵。

    这导致了所提议体系结构的第二个想法:明智地应用维度缩减和预测,否则计算需求会增加太多。这是基于嵌入的成功:即使是低维嵌入也可能包含大量关于相对较大的图像补丁的信息…也就是说,在昂贵的3 x 3和5 x 5卷积之前,使用1 x 1卷积来计算缩减。除了用作还原,它们还包括使用整流线性激活,这使它们具有双重用途。”

    关于1×1卷积的一个有趣的观点来自于Yann Lecun:“在卷积网络中,没有所谓的“全连接层”。只有具有1x1卷积核和完整连接的卷积层。”


    by ceroo 2
    08

    5. Convolution Arithmetic

    We now know how to deal with depth in convolution. Let’s move on to talk about how to handle the convolution in the other two directions (height & width), as well as important convolution arithmetic.

    Here are a few terminologies:

    • Kernel size: kernel is discussed in the previous section. The kernel size defines the field of view of the convolution.

    • Stride: it defines the step size of the kernel when sliding through the image. Stride of 1 means that the kernel slides through the image pixel by pixel. Stride of 2 means that the kernel slides through image by moving 2 pixels per step (i.e., skipping 1 pixel). We can use stride (>= 2) for downsampling an image.

    • Padding: the padding defines how the border of an image is handled. A padded convolution (‘same’ padding in Tensorflow) will keep the spatial output dimensions equal to the input image, by padding 0 around the input boundaries if necessary. On the other hand, unpadded convolution (‘valid’ padding in Tensorflow) only perform convolution on the pixels of the input image, without adding 0 around the input boundaries. The output size is smaller than the input size.

    This following illustration describes a 2D convolution using a kernel size of 3, stride of 1 and padding of 1.

    1_d03OGSWsBqAKBTP2QSvi3g.gif

    There is an excellent article about detailed arithmetic (“A guide to convolution arithmetic for deep learning”). One may refer to it for detailed descriptions and examples for different combinations of kernel size, stride, and padding. Here I just summarize results for the most general case.

    For an input image with size of i, kernel size of k, padding of p, and stride of s, the output image from convolution has size o:

    5、卷积算法

    我们现在知道如何处理卷积中的深度。让我们继续讨论如何处理其他两个方向(高度和宽度)的卷积,以及重要的卷积算法。

    下面是一些术语:

    卷积核大小:卷积核在前面的章节讨论过。卷积核大小决定了卷积的感受野大小。

    步长:它定义了卷积核扫过特征图时的步长大小。步长为1表示卷积核逐个扫过特征图的像素。步长为2表示卷积核以每步移动2个像素(即跳过一个元素)扫描特征图。我们可以用步长(>=2)对特征图进行向下采样。

    填充:它定义了如何处理特征图的边框。如果必要的话,在输入边界进行全0填充,填充卷积(Tersorflow中padding=‘same’)将保持输出和输入的特征图尺寸相同。另一方面,完全不使用填充的卷积(

    <![endif]-->


    <![endif]-->


    <o:p></o:p>

    by crystal997 2
    09

    6. Transposed Convolution (Deconvolution)

    For many applications and in many network architectures, we often want to do transformations going in the opposite direction of a normal convolution, i.e. we’d like to perform up-sampling. A few examples include generating high-resolution images and mapping low dimensional feature map to high dimensional space such as in auto-encoder or semantic segmentation. (In the later example, semantic segmentation first extracts feature maps in the encoder and then restores the original image size in the decoder so that it can classify every pixel in the original image.)

    Traditionally, one could achieve up-sampling by applying interpolation schemes or manually creating rules. Modern architectures such as neural networks, on the other hand, tend to let the network itself learn the proper transformation automatically, without human intervention. To achieve that, we can use the transposed convolution.

    The transposed convolution is also known as deconvolution, or fractionally strided convolution in the literature. However, it’s worth noting that the name “deconvolution” is less appropriate, since transposed convolution is not the real deconvolution as defined in signal / image processing. Technically speaking, deconvolution in signal processing reverses the convolution operation. That is not the case here. Because of that, some authors are strongly against calling transposed convolution as deconvolution. People call it deconvolution mainly because of simplicity. Later, we will see why calling such operation as transposed convolution is natural and more appropriate.

    It is always possible to implement a transposed convolution with a direct convolution. For an example in the image below, we apply transposed convolution with a 3 x 3 kernel over a 2 x 2 input padded with a 2 x 2 border of zeros using unit strides. The up-sampled output is with size 4 x 4.

    1_KGrCz7aav02KoGuO6znO0w.gif

    Up-sampling a 2 x 2 input to a 4 x 4 output. Image is adopted from this link.

    Interestingly enough, one can map the same 2 x 2 input image to a different image size, by applying fancy padding & stride. Below, transposed convolution is applied over the same 2 x 2 input (with 1 zero inserted between inputs) padded with a 2 x 2 border of zeros using unit strides. Now the output is with size 5 x 5.

    6、转置卷积(反卷积)

    对于许多应用程序和许多网络体系结构,我们通常希望能够对正常卷积在相反方向进行转换,即我们希望进行向上采样。一些例子包括生成高分辨率图像和将地位特征图映射到高维空间,如自动编码器或语义分割。(在后面的示例中,语义分割首先提取编码器中的特征图,然后在解码器中还原原始图像大小,以便对原始图像中的每个像素进行分类。)

    <![endif]-->

    习惯上,可以通过应用差值方案或手动创建规则来实现向上采样。神经网络等现代架构倾向于让网络本身在没有人为干预的情况下自动学习正确的转换。为了实现这一点,我们可以使用转置卷积。

    在文献中, 转换卷积也被称为反卷积, 或小数步长卷积。但是, 值得注意的是, "反卷积" 这个名称不太合适,因为转换卷积并不是信号/图像处理中定义的真正的反卷积。从技术上讲,信号处理中的反卷积逆转了卷积操作。这里的情况并非如此。正因为如此,一些作者强烈反对将转置卷积称为反卷积。人们称之为反卷积主要是因为简单。稍后,我们将了解为什么将此类操作称为转置卷积是自然的,也是更合适的。
    进行具有直接卷积的转置卷积是有可能的。对于下图中的一个示例, 我们在2

    2的输入图像上</ahelp_8">应用单位步长、对边界进行 2
    <ahelp_8" name=“help_8”>x

    2 全0填充</ahelp_8">的 3 x 3 卷积核的转置卷积。向上采样输出图像的大小为 4 x 4。

    <![endif]-->

        向上采样 2 x 2 输入到 4 x 4 输出。图像是通过此链接采用的。
    有趣的是,通过应用不同的填充和步长,可以将相同的 2 x 2 输入图像映射成不同

    2的输入图像(在输入之间插入一个0)上</ahelp_8">应用单位步长、对边界进行 2
    <ahelp_8" name=“help_8”>x

    2 全0填充</ahelp_8">的转置卷积

    </ahelp_8">。现在输出图像的大小为 5 x 5。

    <o:p></o:p>




    <o:p></o:p>

    by crystal997 2
    10

    Viewing transposed convolution in the examples above could help us build up some intuitions. But to generalize its application, it is beneficial to look at how it is implemented through matrix multiplication in computer. From there, we can also see why “transposed convolution” is an appropriate name.

    In convolution, let us define C as our kernel, Large as the input image, Small as the output image from convolution. After the convolution (matrix multiplication), we down-sample the large image into a small output image. The implementation of convolution in matrix multiplication follows as Large = Small.

    The following example shows how such operation works. It flattens the input to a 16 x 1 matrix, and transforms the kernel into a sparse matrix (4 x 16). The matrix multiplication is then applied between sparse matrix and the flattened input. After that, the resulting matrix (4 x 1) is then transformed back to a 2 x 2 output.


    Matrix multiplication for convolution: from a Large input image (4 x 4) to a Small output image (2 x 2).

    Now, if we multiple the transpose of matrix CT on both sides of the equation, and use the property that multiplication of a matrix with its transposed matr

  • 1
    点赞
  • 1
    评论
  • 7
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

打赏
文章很值,打赏犒劳作者一下
相关推荐
©️2020 CSDN 皮肤主题: 代码科技 设计师:Amelia_0503 返回首页

打赏

Harrytsz

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值