I was looking for an effective work on deep convolutional Neural Network and came across this paper. Sharing the link as it might be useful those working on it:
Abstract: Deep Convolutional Neural Networks (CNNs)
have shown superior performance on the task of single-label
image classification. However, the applicability of CNNs to multilabel
images still remains an open problem, mainly because
of two reasons. First, each image is usually treated as an
inseparable entity and represented as one instance, which mixes
the visual information corresponding to different labels. Second,
the correlations amongst labels are often overlooked. To address
these limitations, we propose a deep Multi-Modal CNN for
Multi-Instance Multi-Label image classification, called MMCNNMIML.
By combining CNNs with Multi-Instance Multi-Label
(MIML) learning, our model represents each image as a bag of
instances for image classification and inherits the merits of both
CNNs and MIML. In particular, MMCNN-MIML has three main
appealing properties: i) It can automatically generate instance
representations for MIML by exploiting the architecture of
CNNs. ii) It takes advantage of the label correlations by grouping
labels in its later layers. iii) It incorporates the textual context
of label groups to generate multi-modal instances, which are
effective in discriminating visually similar objects belonging to
different groups. Empirical studies on several benchmark multilabel
image datasets show that MMCNN-MIML significantly
outperforms the state-of-the-art baselines on multi-label image
The whole paper is available here:
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8432496