key: cord-0061015-r5xs2t7a
authors: Li, Xiao-xia; Zhang, Hai-yan
title: Moving Target Location Method of Video Image Based on Computer Vision
date: 2020-06-08
journal: Multimedia Technology and Enhanced Learning
DOI: 10.1007/978-3-030-51100-5_16
sha: 2381c73bab56a0a0484ddfcb0e17c0884a1dda49
doc_id: 61015
cord_uid: r5xs2t7a

By the localization and recognition of human moving target in video image, combined with the information of human motion feature in video image, the moving target localization and visual reconstruction is realized, this paper analyzes the feature quantity of moving objects in video image, improves the training level, and proposes a moving objects positioning technology of video image based on computer vision and 3D feature point reconstruction. According to the moving feature position of human body, the 3D information modeling and image acquisition of moving target is carried out by using video information acquisition and spatial feature scanning methods. The moving feature points of the collected moving target video image are calibrated and arranged, and the 3D edge outlines feature point set of human skeleton is extracted and represented as a high dimensional vector to form the regular feature database of moving target video image. The moving points in the regular feature database of moving target video image are fusion to realize the reconstruction of moving target video image and the location of moving target. The simulation results show that the method has good real-time and accuracy in moving target location of video image, has strong ability of 3D marking of human moving points, and has high accuracy of extracting moving motion features.

The movement away human body is random, scattered and nonlinear. In order to improve the ability of quantitative analysis of human motion and realize the scientific guidance of human motion training, it is necessary to reconstruct the three-dimensional image of the feature points of human motion [1] . With the development of video image tracking and scanning technology, the video image tracking method is used to reconstruct the feature points of human motion, to overcome the randomness and nonlinearity of human action, to avoid the problems of inaccurate, reconstruction of stiffness and rough edges of human motion, to use the video image tracking method of moving target feature points of video image to reconstruct the human motion, and to recognize the human action by video image tracking technology. To depict the regular characteristics of human motion from the process of human motion, so as to improve the scientific nature of human motion training, the study of video image moving target feature point tracking method has attracted great attention on experts and scholars.

In the traditional method, the tracking technology of the moving target characteristic point of the video image mainly comprises a human body motion action characteristic fusion scanning method based on a point cloud technology, a gesture information quantitative tracking identification method and a motion tracking identification method based on the computer vision feature extraction. The moving target feature quantity of the video image is analyzed by the method of image processing and attitude sensing and the like, the data mining and information fusion of the human body motion point are carried out, the sports skill and the level are improved, the relevant documents are researched and certain motion guidance and action correction performance are obtained, in which, the reference [2] method uses matrix f norm constraints for the residual term to solve the orthogonal prucker regression model, making the model very sensitive to some noise (e.g., illumination). Replace the original matrix f norm constraint with a more robust 1 norm constraint and propose a sparse orthogonal pruck regression model. The model can be solved by an effective alternating iterative algorithm. The experimental results show that the model can deal with the change of face attitude effectively. However, the accuracy of this method is general, and the accuracy of 3D marking and feature extraction is not high.

In reference [3] , the method of facial expression modeling based on interval algebra bayesian networks is able to capture not only the spatial relation of the face, but also the complex temporal relation of the face, so that the face expression can be recognized more effectively. The method can improve the speed of training and recognition by using only tracking-based features and not manually marking peak frames. However, the accuracy of this method is general, and the accuracy of 3D marking and feature extraction is not high. the three-dimensional attitude information acquisition and threedimensional reconstruction method of the motion characteristic is presented in the document [4] , and the three-dimensional laser scanning method is adopted to acquire the motion characteristics of the human body target, and the tracking system of the target action is constructed, the method carries out the batch read-in of the human body motion characteristic information by using a sampling point noose interpolation method, realizes the three-dimensional information reconstruction and volume rendering of the human body dynamic information, improves the accuracy of the tracking and identification of the human body motion point, but the method cannot precisely register the control point when being interfered by the large motion characteristic, so that the dynamic accumulation of the motion action characteristic points is caused, so that the calculation cost is increased; the method for establishing the three-dimensional human body model of an athlete based on the laser scanning technology is proposed in the document [5] , the human body model is established on the basis of the human skeleton model, the three-dimensional human body model reconstruction is carried out by the least square regression expansion algorithm, and the three-dimensional scattered video image motion target positioning is realized, and the problem of the method is that the tracking and identification accuracy of the high-dimensional motion pose information is not high [6] .

In this paper, a moving target location technique for video image based on computer vision and 3D feature point reconstruction is proposed. Firstly, the 3D information modeling and image acquisition of moving targets is carried out by using video information acquisition and spatial feature scanning methods. According to the moving feature parts of human body, the moving feature points of the collected moving target video image are calibrated and arranged, and the 3D edge outlines feature point set of human skeleton is extracted and represented as a high dimensional vector to form a regular feature database of moving target video images. Then, the moving points in the regular feature database of moving target video image are fusion, and the moving target videos image reconstruction and moving target location are realized. Finally, simulation experiments are carried out to show the superior performance of this method of improving the accuracy of moving target location and recognition in video images.

Motion Target in Visual Space

In order to realize video image tracking and recognition of moving target feature points in video image, firstly, the method of video information acquisition and spatial feature scanning is used to model the three-dimensional information of moving target, and through the three-dimensional image acquisition of human motion action, combined with the collected images, the motion feature points are calibrated and processed to realize the information fusion and tracking recognition of human body motion points [7] . The three-dimensional tracking and scanning method of visual space is used to obtain the dynamic information on human body, and the two-dimensional action manifolds analysis models on human motion point is constructed. The human motion studied in this paper mainly include standing, upper limb motion and lower extremity movement, and the three-dimensional scattered points of human motion point are calibrated, as shown in Fig. 1 .

According to the calibration result of the human body motion characteristic point given in Fig. 1 , the human body are characterized by a human skeleton, the visual space scanning image output path of the joint nodes u to v is represented by the G ¼ ðV; EÞ, taking the three-dimensional visual space scan image of the human body as the pixel sequence, d G ðu,v) , and input the original pixel feature data of the visual

it can be used to calculate the target sequence of human motion feature u ¼ y 0 ; z 0 ; k; / ð Þ T . The three-dimensional information modeling axis of the moving object can be calculated by using the least square method, the coordinate values, the u 3 ð Þ ¼ u

, and the direc-

along the axis of the human motion attitude distribution are calculated, thereby realizing the image acquisition of the three-dimensional information modeling of the moving target, it also provides accurate data input and entry for tracking feature points of moving objects in video images [8] .

According to the motion characteristic part of the human body, the acquired moving target video image is calibrated and arranged, the action point inverse mapping on the two-dimensional manifold is realized, The three-dimensional model of cylindrical surface fitting algorithm is established, which can describe the coordinate position of visual space in spatial coordinate system p Ã ¼ X cs2 ð Þ ; h Ã ; q Ã À Á . The three-dimensional information modeling on the moving object of the three-dimensional model can be obtained by using the cylindrical surface fitting algorithm:

Set the edge pixel set of the three-dimensional image, the line of the cylindrical surface of the motion space distribution is h e ; q e ð Þand the bus is parallel to the x axis. The following formula can be used to describe all the action vectors in the highdimensional Euclidean space: The set matrix which represents the characteristic points of human action is obtained by formula (2) , and the X-axis direction translation is carried out with q e À R as the unit of displacement, and the three-dimensional manifolds distribution describing n human body cohesion actions is obtained as shown in Fig. 2 .

The training set of the human body motion characteristic part is extracted from the high-dimensional Euclidean space, a joint action vector is constructed in the area of the motion distribution range of the human body, and the feature values of the moving target feature points in the video image of M are small to small, and the human body movement feature points are shown in Fig. 3 . According to the arrangement result of Fig. 3 , the maximum gray-level contour point mark is carried out, and the high-dimensional vector Iði; jÞ of the moving point set of the three-dimensional human visual space scanning output is obtained as follows:

Wherein, I ðkÞ is a manifold vector of all three-dimensional scattered human moving sample points to a low-dimensional space, and the motion characteristic information is marked according to the three-dimensional edge contour feature point set of the human skeleton [9] [10] [11] .

3 The Realization of the Moving Target of Three-Dimensional Scattered Video Image

Based on the construction of human motion process and the extraction of moving feature points by using video information acquisition and spatial feature scanning method, the moving target location design of 3D scattered video images is carried out, and the regular feature database of moving target video image is formed by extracting the 3D edge outline feature point set of human skeleton [12] . The visual spatial information fusion method is used to track and recognize the moving points in the regular feature database of moving target video images. The 3D irregular point data onto human motion are regularized and fusion. After simple linear interpolation processing, the effective frames of the video image tracking images of moving human scattered points are obtained as:

In the above formula, the d ij ðZÞ is the Euclidean distance from the pixel point, and d X ðx i ; x j Þ is the edge pixel point of the motion dynamic information registration. According to the characteristic space decomposition method, the motion action characteristic points of the three-dimensional human body model of the athlete are adaptively ordered, the template characteristic of the three-dimensional human body model is established, the acquired moving target video image is divided into pixel points. The method includes the following steps: obtaining a sub-block M Â N of a human body motion characteristic point in a sub-block G of G m;n number 2-2, and carrying out discretization processing on the motion rules attribute of a human body motion point in a two-dimensional quantization space [13] , and obtaining the characteristic acquisition result of the motion characteristic point model of the athlete as follows: In which, the u 2 f1; 2g; v 2 f1; 2g represents the relevant factors of visual space fusion of moving target video image, and the three-dimensional visual space fusion method to conduct image retrieval. The posture and movement of the human body are constructed, and the second moment of image visual space fusion is obtained:

According to the multiple key points after image visual space fusion, the posture characteristics of human body are represented, which is used as a quantitative factor to realize video image tracking and recognition [14] .

The adjustment parameters of a three-dimensional scattered motion point are set up, and the visual reconstruction of the human motion attitude model are carried out by using the key point and frame point information feature matching method. The neighborhood characteristics of the original motion attitude data are used to identify the human action, and the inverse mapping of the human motion sample point is obtained as:

The Euclidean distance between each video image tracking point and the pixel point is calculated. For k adjacent points, The autocorrelation feature matching method was used to obtain the 3D moving target video image reconstruction results:

Wherein, u kl is the stable pixel value in the process of optimizing the scattered points of human motion, h kl is the edge pixel set of k sample points, and e i ðtÞ represents the optimal reconstruction weight. For the human motion feature vectors of N input, the support vector machines classification method is used to classify the motion features. According to the video image tracking and recognition method, the tracking quantitative function of a moving action point is obtained as follows:

In which, ½w lk i1 ; Á Á Á ; w lk in is a control kernel function of a three-dimensional edge contour feature point set, and ½x 1 ðt À kÞ; Á Á Á ; x n ðt À kÞ T is a motion action tracking control coefficient. The method comprises the following steps of: The moving points in the moving target video image regularization feature database are mapped to the lowdimensional space, performing feature compression, and reducing computational overhead, thereby obtaining a set of output samples of the video image tracking measurement of the video image moving target feature point as:

SSIMðx; yÞ ¼ ½lðx; yÞ a Á ½cðx; yÞ b Á ½sðx; yÞ c ð12Þ

The three-dimensional video images tracking information representing a human body motion point can be obtained by the above formula, and N motion identification elements are obtained in the feature space C, thereby realizing the three-dimensional scattered video image tracking control on the human body motion point [15] .

In order to test the application performance of this method in video image tracking and recognition of moving target feature points in video image, simulation experiments are carried out. Matlab 2012 and Visual C simulation software are used for image processing and data analysis and analysis. The normal deviation of data sampling of moving action feature points is set to 0.21, and the allowable accuracy of the corresponding 3D reconstruction surface is 0.34. After sampling, the fitting parameter of point cloud is set to 12, the regression parameter of scattered point of human motion is set to 0.01, the adjacent point k of pixel partition is set to 12, the signal-to-noise ratio of 3D image acquisition in visual space is −12 dB, 302 Â 250 pixel 3D mannequin image is used as the test set, according to the above parameter setting, the moving target feature point of video image is collected. The three-dimensional information modeling and image acquisition of moving target are realized, and the results are shown in Fig. 4 .

The moving target feature point of the video image as given in Fig. 4 is a training set, and the human body motion feature point calibration and arrangement are carried out through the video image tracking technology, so that the three-dimensional visual space image reconstruction is realized, and the three-dimensional visual space reconstruction result of the human motion model is obtained as shown in Fig. 5 .

The analysis of the results of Fig. 5 shows that the video image moving object feature point acquisition and the visual space image reconstruction are carried out by the method, the motion visual structure characteristic of the human body motion is effectively reflected, and the video image tracking recognition is carried out on the basis of the key motion feature point video image, The results are shown in Fig. 6 . In order to quantitatively analyze the performance of the method in the realization of the tracking and recognition of human moving video images. Table 1 shows the comparison between the tracking offset, the calculation cost and the output signal-tonoise ratio of different methods. The method has the advantages of low offset of video image tracking, good description accuracy and short calculation time, so that the realtime and self-adaptive of the tracking is improved, the output signal-to-noise ratio is high, and the performance is better.

In this paper, the problem of tracking recognition and feature analysis of human motion points is studied. Combined with the human motion feature information in video image, the moving target location and visual reconstruction are realized, the moving target feature quantity of video image is analyzed, and the level of motion training is improved. A video image moving target location technology based on computer vision and 3D feature point reconstruction is proposed. The 3D information modeling and image acquisition of moving target are carried out by using video information acquisition and spatial feature scanning methods. The moving feature points of the collected moving target video image are calibrated and arranged. The 3D edge outline feature point set of human skeleton is extracted and represented as a high dimensional vector, 6 . Video image tracking recognition of human motion points and the image fusion processing is carried out to realize the moving target video image reconstruction and moving target location. It is found that the real-time and accuracy of moving target location in video image is better, the deviation is low, and the adaptive ability is better.

Graph embedding and extensions: a general framework for dimensionality reduction

Sparse orthogonal procrustes problem based regression for face recognition with pose variations

Facial expression recognition using temporal relations among facial movements

Remote sensing image fusion via sparse representations over learned dictionaries

Analysis of accuracy in orbit predictions for space debris using semianalytic theory

Image denoising using trivariate prior model in nonsubsampled dual-tree complex contourlet transform domain and non-local means filter in spatial domain

3D face modeling and validation in cross-pose face matching

Optimization approach for multi-scale segmentation of remotely sensed imagery under k-means clustering guidance

An MPI-based parallel pyramid building algorithm for large-scale RS image

Supervised feature extraction based on orthogonal discriminant projection

Joint embedding learning and sparse regression: a framework for unsupervised feature selection

Global contrast based salient region detection

DHSNet: deep hierarchical saliency network for salient object detection

Spatiotemporal saliency detection using textural contrast and its applications

Hierarchical saliency detection