key: cord-0854611-cspm02zw
authors: Liu, Jingmiao; Ren, Yu; Qin, Xiaotong
title: Study on 3D Clothing Color Application Based on Deep Learning-Enabled Macro-Micro Adversarial Network and Human Body Modeling
date: 2021-09-07
journal: Comput Intell Neurosci
DOI: 10.1155/2021/9918175
sha: c61ce8837fb30ad4a77c905e28d287153578d027
doc_id: 854611
cord_uid: cspm02zw

In real life, people's life gradually tends to be simple, so the convenience of online shopping makes more and more research begin to explore the convenience optimization of shopping, in which the fitting system is the research product. However, due to the immaturity of the virtual fitting system, there are a lot of problems, such as the expression of clothing color is not clear or deviation. In view of this, this paper proposes a 3D clothing color display model based on deep learning to support human modeling-driven. Firstly, the macro-micro adversarial network (MMAN) based on deep learning is used to analyze the original image, and then, the results are preprocessed. Finally, the 3D model with the original image color is constructed by using UV mapping. The experimental results show that the accuracy of the MMAN algorithm reaches 0.972, the established three-dimensional model is emotional enough, the expression of the clothing color is clear, and the difference between the color difference and the original image is within 0.01, and the subjective evaluation of volunteers is more than 90 points. The above results show that it is effective to use deep learning to build a 3D model with the original picture clothing color, which has great guiding significance for the research of character model modeling and simulation.

With the continuous improvement of quality of life as well as the popularity of e-commerce, the fitting system to help people virtual fitting gradually comes into view. Shi et al. proposed a full-automatic 3D virtual assembly system based on the 2D whole body image and 3D garment model, constructed 3D human body model from 2D human body image input by users, and obtained that the system is stable and effective in simulation training [1] . Guo et al. proposed a remote fitting system for clothing e-commerce, which can simulate the body shape of consumers through physical assembly robot and help consumers realize remote fitting [2] . In order to solve the accuracy of body shape measurement in clothing fitting, Lu et al. proposed a synthetic dataset of human avatars under wide clothing, including 1500 synthetic people with different body shape, posture, clothing, skin color, and background [3] . Hu et al. led the team members to propose the first fully automatic general method for virtual fitting of clothing, hair, shoes, watches, necklaces, hats, and other items. A large number of experimental results show that this method can well realize dynamic fitting [4] . Liu et al. proposed a fitting method based on the 3D virtual assembly of human body contour, which only needs one RGB image and transfers garment texture to human body contour under the guidance of the UV texture map, and achieved good results in the test [5] . Kurniawati et al. proposed in 2020 to use Microsoft Kinect tracking technology and 3D modeling mixer manipulation technology to establish an interactive 3D virtual fitting room, and the system can evaluate the matching degree of clothing and help users choose the most suitable clothing [6] . According to the influence of COVID-19, Meng put forward a virtual garment fitting system for the convenience of consumers and studied the color saturation of the system. e experimental results show that the proposed method has high efficiency [7] . Based on the model parameters derived from inaccurate anthropomorphic data, Ziegler et al. proposed a comprehensive identification method for geometric parameters of the human lower limb kinematics model based on three-dimensional marker position and parameters for determining (cycling) gait trajectory [8] . Kimpara et al. proposed an active driving system based on the human body model with the cooperation of team members.

e system adjusts the dynamics of the human body model so that the human body model can make a real response to the maneuver [9] . Han et al. proposed the sequence view of sequence tags as a new deep learning model based on RNNs' encoding and decoding structure. e results show that this method can more effectively aggregate sequence views to learn more distinctive global features [10] . Jiang et al. proposed an effective deep learning model based on the convolutional neural network, which was used for 3D underground mapping of 2D ground observation data to realize 3D address feature construction [11] . Chen and Huang have studied the feature extraction of the 3D art design model based on deep learning, proposed a 3D art network social communication method, and verified the feasibility of this method [12] . Zhang studied the application of the dominant color extraction algorithm in clothing image retrieval and proposed a main clothing image color feature extraction algorithm to extract the main colors in the clothing image [13] . Kim et al. proposed a digital transformation method of human clothing in the image, which combined the geometric and style characteristics of clothing with the neural network. Finally, in the virtual trial experiment, it showed better advantages than the traditional comprehensive method [14] .

To sum up, at present, most of the research in the world is carried out on the construction and application of the human body model. In addition, with the improvement of demand, the research on the clothing color is also increasingly carried out. But it is not difficult to see that most of the research is superficial, by using the automatic drive of human modeling to simulate the transformation of clothing color, the generation of clothing color of virtual human character can be realized. is is the key to optimize the virtual simulation fitting system. erefore, this research will combine the deep learning technology of artificial intelligence and use the human body modeling drive to study the generation of the two-dimensional clothing image to three-dimensional persona clothing color model. e convenient and effective method will have great value in theoretical research and practical application.

In this paper, the meanings for some important parameters have been explained in Table 1 . With the help of this table, some equations can be understood easily.

2.1. Clothing Color Analysis. HSV color space is used in this study, which is composed of hue, saturation, and lightness. HSV color space is usually represented by the closed inverted cone, where h represents color information, i.e., hue, which means the spectral color position, s represents saturation, i.e., the proportion value between the selected color purity and the maximum purity of the color, and V represents brightness [15] [16] [17] . e top of HSV color space starts from red and goes counterclockwise. It is composed of red, yellow, green, cyan, blue, and magenta. Counterclockwise indicates hue. e change from the inside to the outside indicates the saturation of the color space, the inside represents 0, and the outermost represents 100% saturation. e change from bottom to top indicates the lightness. e color of the bottom point is black, and the lightness is 0. When it reaches the top, the lightness is 100%. Because the visual effect of color is affected by hue, saturation, and lightness, the color is adjusted according to the visual effect, as shown in Table 2 .

As shown in Table 2 , values are assigned in a counterclockwise direction, with red as 0°, yellow is 60°, green is 120°, Cyan is 180°, blue is 240°, magenta is 300°, and the value range of each color system is set to float up and down 30°. Saturation is the degree of purity of color. According to the information given in Figure 1 , it is divided into 9 segments according to the saturation from inside to outside. Low saturation, medium saturation, and high saturation represent each of the three segments from inside to outside. In addition, the lightness represents the lightness of color. e lightness of color is divided into low lightness, medium lightness, and high lightness from bottom to top. e same HSV color space is divided into 9 segments from bottom to top, and the lightness occupies 3 segments from low to high, and the numerical range in the table is also divided equally according to this segmentation method.

Human body analysis is to segment the human image into multiple semantic blocks. It usually uses convolutional neural network architecture for feature recognition, and for the pixel loss caused by semantic inconsistency, it usually uses the conditional random field for subsequent processing. However, a large number of studies have found that the range of processing inconsistency of conditional random fields is very limited, and there may be poor marker graphs [18, 19] . erefore, in view of the above problems, a new confrontational network is gradually emerging. e main reason is that the discriminator is introduced into the confrontational network to identify human parts, which can enforce a higher level of consistency [20, 21] .

With the deepening of the research on the adversarial network, its shortcomings also begin to appear. For example, in the network model, only one adversarial loss is generated in the direction propagation of a single discriminator, and one adversarial loss trains the two target layers of local inconsistency and semantic inconsistency separately. And, the confrontational loss caused by a single discriminator will make the training imbalance, resulting in poor convergence of the model [22] . erefore, the adversarial network used in this study is the macro-micro adversarial network (MMAN). e main feature of this adversarial network is that it contains two discriminators, which can identify the label images of different pixel sets. e framework of the 3D virtual model is shown in Figure 1 .

As shown in Figure 1 , the MMAN model framework is mainly composed of three parts, one of which is a dual output generator. For the input image, the model will automatically process the image size and then use the dual output generator to generate a low resolution tensor, from which the low-resolution and high-resolution label images can be obtained. e other is macrodiscriminator, which is composed of four convolution layers. Its main function is to monitor the low resolution output label image from the dual output generator so that the global semantic inconsistency will be punished. e third is the microdiscriminator, which has a shallow structure of three convolution layers. Its main function is to supervise the high-resolution tag map and punish the local inconsistency. e training process of the recognition model of the MMAN algorithm is analyzed. Firstly, given the figure image with the shape of 3 * L * W and the tag image with the shape of C * L * W, the pixel classification loss can be expressed as follows:

(1)

In formula (1), it represents the double output generator, the number of background classes, and the authenticity probability of the first pixel in. When the first pixel is in, its value is 1. It represents the prediction probability of the ith pixel in C. When the ith pixel is in C, its value is 0. In order to ensure that the final output results of the dual output generator can maintain consistency and the loss value in the final image output is small, this study combines classification loss and adversarial loss to achieve consistency, so as to ensure the high quality of image output. See the following formula for the specific expression:

In equation (2), λ is used to balance pixel classification loss and adversarial loss. L adver (G, D) is adversarial loss, which isL adver (G,

In addition, the combination mechanism of crossentropy loss and antagonistic loss is used to supervise the top and bottom output of the dual output generator. e specific way is shown in the following equation: Coordinates around the midpoint vH 1 , vS 1 , and vV 1 HSV three factor mean s x 1 Difference between pixel and mean 

In equation (3), L mcel (G) represents the cross-entropy loss between the output low-resolution label image and the small-size label image, L mceh (G) represents the cross-entropy loss between the output high-resolution label image and the original quality label image, L adver (G, D Ma ) represents the confrontational loss of the output low-resolution label image, and L adver (G, D Mi ) represents the confrontational loss of the output high-resolution label image. e purpose of the model is to control the relative importance of the four losses. e training tasks of MMAN are as follows:

G * , D * Ma , and D * Mi in equation (4), respectively, represent the output of dual output generator, macrodiscriminator, and microdiscriminator after alternating optimization, and the final optimization result needs L MMAN (G, D Ma , D Mi ) to converge.

In this study, a framework of the clothing color system based on image three-dimensional persona is proposed, which is convenient to generate the virtual three-dimensional persona clothing model from the image quickly and conveniently. e framework of the model is shown in Figure 2 .

As shown in Figure 2 , the whole algorithm process is divided into three steps. First, the input task image is used for clothing recognition to realize the clothing analysis of the image. en, the similar three-dimensional character model needs to be generated for the subsequent simulation operation, and the template model selection and preprocessing are carried out. e created 3D persona model is only a simple 3D structure, so in order to facilitate the clothing color adaptation of the model, it is necessary to expand the model by UV. e UV unfolding of the model is to realize the UV mapping of the three-dimensional model, which corresponds the surface position of the three-dimensional model to the position of the two-dimensional image, and then, the color filling of each pixel of the three-dimensional model can be realized, and finally, the color change of clothing can be presented on the three-dimensional model. For the expanded texture image, different bright colors are used to distinguish clothes and trousers, and the gray values of the two colors are stored in the model library. e sample model in the model library is processed accordingly, and the problem image with color differentiation is also obtained. At this time, the preprocessing of the model is completed.

After realizing the region recognition of the coat and trousers, it is necessary to restore the clothing texture and color on the 3D model by using the relevant information in the picture. Firstly, the texture of the visible part of the clothing is mapped to the corresponding area of the 3D model according to the texture and color of the corresponding position of the image. In the process of image mapping, rotation operation is also needed to adjust the direction processing model, and the rotated image is adapted to the size value of the simulation image [23] [24] [25] . When the programming operation of image rotation is carried out, the reverse mapping method is adopted. e reverse mapping is the opposite of the mapping process, mainly through the coordinates obtained after rotation to reverse map the 

In equation (5), the ordinate of the image mapping point obtained by reverse mapping and the abscissa of the position point obtained by mapping are represented by y and x and α represents the image rotation angle. e results are as follows:

x � x ′ * cos α − y ′ * sin α,

Equation (6) represents the coordinates of the calculated image after rotation, but it is generally believed that the origin of the image is in the upper left corner of the image. If this method is used to calculate the nonsquare image, there is a certain probability that a negative value will appear. erefore, equation (7) is used for subsequent processing, the origin of the image is moved to the center of the image, and the coordinates in the original image are restored by equation (7) . e calculation process is shown in the following equation:

In equation (8) , sWid represents the width of the original image, sHgt represents the height of the original image, dWid represents the width of the image after rotation, and dHgt represents the height of the image after rotation. With Computational Intelligence and Neuroscience process the pixel information. In order to ensure that the color value of each pixel exists, bilinear interpolation is used for processing, as shown in the following equation:

In equation (9), the abscissa of the middle point of the image and the ordinate of the middle point of the image are represented. Q 11 , Q 21 , Q 12 , and Q 22 , respectively, represent the position coordinates around the middle point, which are (x 1 , y 1 ), (x 2 , y 1 ), (x 1 , y 2 ), and (x 2 , y 2 ). is study is to calculate in HSV color space, but there will also be problems such as clothing pattern interference and recognition error, so we need to carry out extreme value elimination operation to get brighter clothing color. For the elimination of the extreme value, first calculate the mean value of the three factor channel in HSV, see the following equation:

In equation (10), vH 1 represents the mean value of hue, vS 1 is the mean value of saturation, and vV 1 represents the average value of lightness, which is the three-channel value of the pixel in HSV color space, and then traverses all pixels through the following equation:

In equation (11), s x 1 represents the difference between each pixel and the mean value. After eliminating according to the extreme value, the value of the pixel will not appear too large, so as to ensure that the color value deviation of the pixel is reduced. Finally, the proposed method is used to construct the 3D virtual decision-making clothing model, and the generated model is color rendered.

is paper analyzes the performance of automatic recognition of the MMAN algorithm.

e experimental environment used in the experiment is a remote control server. e system is Linux system, and the GPU memory is 8 GB [26] . In addition, the deep learning framework construction and running environment is Cuda8.0, and the program is written with Python 3.7. e image database used in the experiment is LIP model library, and 20% of the model library is used as the test set, and 80% is used as the training set, and the first step is to train and test the deep learning model. e results are shown in Figure 3 .

As can be seen from Figure 3 , with the increase of the number of iterations, the accuracy of the model in the training set gradually improves, and when the number of iterations reaches 20, the accuracy reaches the maximum value, and the maximum value is 0.972, and then, with the increase of the number of iterations, the accuracy of the model almost remains unchanged. In addition, it can be seen that the loss rate of the model also shows a significant decrease with the increase of the number of iterations, and after the number of iterations reaches 30, the loss rate of the model drops to the lowest value and remains unchanged for a long time. After that, the convergence of the MMAN algorithm is analyzed in the experiment. Finally, the convergence of the algorithm is shown in Figure 4 .

As shown in Figure 4 , the convergence of the algorithm is good, in which the two parameters lossD-r and lossD-f represent the scalar value of the image in the generator pair discriminator, respectively. It can be seen that the values of the two parameters are approaching 0.5 with the increase of test time. LossG represents the scalar value of the image output by the generator. As can be seen from Figure 5 , its value gradually approaches 0. e values of the three parameters show that the MMAN algorithm is a good adversarial network, which means that the image generated by the generator can be passed by the discriminator for subsequent operation.

e experimental environment is win7SP1 operating system, the host is Intel Corei7-2600 CPU, the system memory is 8 GB, the program running environment is visual studio 2013, and the OpenGL interface is used for the final model display and human-computer interaction. Firstly, the extreme value elimination used in the experiment is analyzed, and its advantages are discussed, as shown in Figure 5 .

As shown in Figure 5 , in the experiment, the results of clothing color style extraction under the average value are compared with those obtained after eliminating the extreme value. It is not difficult to see that the results of clothing color style obtained by the two methods after processing the input image are almost the same. However, it can be concluded from the final presentation effect that the extracted value of clothing color style after the elimination of the extreme value is brighter, and the color saturation has also been significantly improved. From the visual effect point of view, it is more suitable for human visual senses.

en, the color model of 3D virtual character clothing is constructed, and the results are shown in Figure 6 . Figure 7 , the final three-dimensional model is relatively clear whether it is for men or women, and it can be seen that the clothing color of the characters in the image is also more accurate in the model. At the same time, it is undeniable that the pattern color on the clothing can be clearly expressed, and the position coordinates on the model are almost similar to those in the original picture. In order to show the effect of the 3D virtual persona clothing color model more scientifically and objectively, this experiment also analyzes the color difference after the model is established through the regional division of the model position and the objective color error detection, as shown in Figure 7 .

As shown in Figure 7 , greater than 0 means the color is brighter, and less than 0 means the color is darker. e established male and female models are divided into 100 regions for color difference detection. It can be seen that the fluctuation of the clothing color curve in male and female 3D virtual models is small. By comparing with the original color, you can find that the color difference curve of the 3D model changes little in light and shade. e above results show that the three-dimensional virtual character clothing color model established in this experiment is more successful, which can clearly show the clothing color information in the original picture, so as to facilitate the user to carry out the later clothing virtual try-on. e final effect of the model needs to be verified by case analysis. is study randomly invited volunteers of different grades from different universities. e reason is that the consumption level of college students on the Internet is relatively heavy. rough the investigation of college students, it can have strong persuasion. Considering that the establishment of the 3D human model will be affected by multiple characters in the picture information, only a single character picture is collected in this study. ere are 115 volunteers in this experiment, including 51 males and 64 females.

e content of the survey is model aesthetics, clothing color, visual effect, and degree of interest. e total score of all content evaluation is 100 points. e results are shown in Figure 8 .

As shown in Figure 8 , the average score of male volunteers on the beauty of the model established in the experiment is higher, reaching 98 points, and the score of female volunteers on the beauty of the model is also up to 95 

With the continuous development of the Internet, people gradually tend to the convenient life brought by the Internet, so the fitting system for online clothing shopping appears in field of vision, but there are many problems in the immature virtual fitting system, such as clothing color interference. erefore, this study proposes a three-dimensional virtual human model based on deep learning and presents the clothing color in the human model with the help of clothing color extraction technology. In the process of virtual modeling, firstly, we need to use deep learning technology to analyze the color of human clothing and confirm the coordinates of the texture color by the method of reverse mapping. In order to meet the visual effect of color, we also need to eliminate the extreme value to ensure the quality of color in the model. e final results show that the MMAN algorithm has high effectiveness, and the accuracy rate reaches 0.972. At the same time, in practical application, it can be concluded that the model has high definition, complete clothing color and pattern, and clear position.

rough the scientific and objective evaluation of clothing color, it is concluded that the color difference between the clothing color in the model and the original image is always within 0.01. According to the results of the investigation and analysis, it also shows that the model proposed in the study is in line with the public aesthetic, and the display of clothing color can also attract a large number of users. To sum up, the 3D virtual character clothing color object expression model established by deep learning is effective, and the clothing color remains intact, which can provide users with good visual effect and facilitate users to make a satisfactory decision when making online clothing selection. In addition, the results of this study will promote the development of the fitting system and optimize quality of life to a certain extent.

e data used to support the findings of this study are available from the corresponding author upon request. Computational Intelligence and Neuroscience 9

Automatic 3D virtual fitting system based on skeleton driving

e design of a remote fitting system for garment e-commerce

Parametric shape estimation of human body under wide clothing

A generic method of wearable items virtual try-on

3D clothing transfer in virtual fitting based on UV mapping

Clothing size recommender on real-time fitting simulation using skeleton tracking and rigging

Application of intelligent virtual reality technology in clothing virtual wear and color saturation after COVID-19 epidemic situation

Simultaneous identification of human body model parameters and gait trajectory from 3D motion capture data

Human model-based active driving system in vehicular dynamic simulation

SeqViews2SeqLabels: learning 3D global features via aggregating sequential views by RNN with attention

Sub3DNet1.0: a deeplearning model for regional-scale 3D subsurface structure mapping

Feature extraction method of 3D art creation based on deep learning

Main color extraction algorithm and its application in clothing image retrieval

Style-controlled synthesis of clothing segments for fashion image manipulation

Video source identification algorithm based on 3D geometric transformation

Perceptually uniform color space for visualizing trivariate linear polarization imaging data

Research on clothing simulation design based on three-dimensional image analysis

Electrophysiological dynamics of antagonistic brain networks reflect attentional fluctuations

Neural antagonistic mechanism between default-mode and task-positive networks

Applicability of a single depth sensor in real-time 3D clothes simulation: augmented reality virtual dressing room using kinect sensor

In-home application (App) for 3D virtual garment fitting dressing room

Human face sketch to rgb image with edge optimization and generative adversarial networks

Fuzzy based image edge detection algorithm for blood vessel detection in retinal images

Modeling visual fields using virtual ophthalmoscopy: incorporating geometrical optics, morphometrics, and 3D visualization to validate an interdisciplinary technique

Big data service architecture: a survey

3DBodyNet: fast reconstruction of 3D animatable human body shape from a single commodity depth camera

Acknowledgments e study was supported by "Chongqing Art Science Research Planning Project, China (Grant no.19QN06)."