68 International Journal of Advanced Network Monitoring and Controls Volume 02, No.2, 2017 Research on Optimization of memory pool management for high concurrent service requests LIU Pingping, LU Zhaopan School of Computer Science and Engineering Xi’an Technological University, Xi’an 710021,China Email:1341369601@qq.com Abstract. In order to quickly and accurately return the information to the user after the keyword are entered, and to effectively reduce the effect on the performance of the program when the search system allocates and deal locates memory frequently under the high concurrency, the Recoverable Fixed Length Memory Pool, Recoverable Variable Length Memory Pool and Allocate Not Free Memory Pool were designed. According to the different scenes features of the search engine. The result shows that, compared with the default system memory allocator, the efficiency of the Recoverable Fixed Length Memory Pool is increased by 70.20% ,the efficiency of the Recoverable Variable Length Memory Pool is increased by 13.84% and the efficiency of the Allocate Not Free Memory Pool is increased by 90.80%. Keywords: High Concurrency, search engine, memory pool, distributor 1. Introductiong The search engine is one of the most important applications of the Internet, which involved in information retrieval, distributed processing, semantic web, data mining etc. The reasonable data structure design, the index and the high concurrent system structure are all the factors that influence the query speed. The basic principle of the search engine has been very stable, but in terms of service, quality and performance needs to be optimized. Most of traditional search engines use keyword matching mode, the system manages memory when the application is not released too frequently, but in the face of massive data processing and storage, search engines seem powerless. There are some drawbacks that directly using the system call Malloc/ Free and New/Delete[1] to distribute and release the memory. For example, calling the Malloc/New system in accordance with the “first match” and “best match” or other algorithms in free memory block table to find a free memory, the memory usage is not high; The system may need to merge free memory blocks when Free / Delete is called, which will result in extra time and space overhead; It is easy to produce a large number of memory fragments when used frequently, which reduces the efficiency and stability of the program; Memory more prone to leaks[2] that caused by memory size continues to increase and memory exhausted. For memory allocation problem, Wang Xiaoyin, a professor of Xi’an University of Posts and telecommunications, analysis and research about the method and principle of the establishment of the memory pool in the article of Implementation and Application of the Memory Pool in Linux Kernel[3]. Memory allocated in memory pool does not need to release, it will be released Research on Optimization of memory pool management for high concurrent service requests 69 when the memory pool is destroyed. Avantages:It speeds up memory allocation, when block of memory is enough, only conduct simple operation such as size judgment and pointer offset; Small memory payloads are high, require less additional information; The memory pool allocated memory usually do not need a separate release, but a unified recovery; In addition to using memory allocation functions instead of malloc, no other special conventions are used. Therefore, to compare of the traditional search engine memory allocation and the memory poolallocation. In this paper, different memory pools are designed for different application scenarios, it managememory allocation to get the fastest allocate and release speed. For the user’s query, the system’s memory management is completely taken over by the programmer, which is more conducive to investigate problem and optimize system, and quickly return a satisfied result for customer. 2. Pinciples and Key Technologies of Search Engine Search engine is based on the information extracted from the web site to establish the database, search the relevant records of user query condition matching, and then return the results to the user according to a certain order. The working principle[4] of search engine is divided into four steps: First, using web crawler technology[5] to automatically grab the web page from the Internet, then analysis the original web page, and set up an index database, and finally searching and sorting in the index database .When there are multiple threads operating, if the system only has one CPU, it can not be carried out more than one thread at the same time, it can only divided the running time of CPU into several periods, then allocate the period of time to each thread, a thread code is run in a time, other threads are hanged up, this way we call concurrency. In the condition of high concurrency, the search system frequently allocate and recover memory will degrade the performance of program and the memory is used in a particular way, and pay the cost of performance on the function that is not required. For long-running background service system, the performance decrease mainly due to the default memory management is a universal, and general memory management usually consider many factors, including the thread, size, recovery time, distribution, frequency and so on. For this reason, it is common to consider use of the memory pool to manage memory allocation, rather than simply using New/Delete, Malloc/Free for dynamic memory allocation. By designing a dedicated memory pool to allocate specific memory and optimize performance in different search application scenarios of search system, and to enhance the mass data storage and search speed, in order to solve the problem of universal memory. 2.1 Principle of memory pool Memory pool[6] is a way of memory allocation, is a device that can dynamically allocate memory. It can only be used by a specific kernel component (that is, the owner of the pool). Owners usually do not directly use the memory pool, when the common memory allocation fails, the kernel call a particular memory pool function to extract the memory pool, in order to get the extra memory. So the memory pool is only a memory of the kernel memory, used at a specific time. As shown in figure 1, the memory pool contains a total of 4 memory blocks. When the memory pool is initially generated, only one block of memory is applied to the system, and the returned pointer acts as the head of the entire memory pool. After the application of the continuous demand for memory, memory pool judgment need to dynamically expand, then once again to apply for a new memory block of the system, and all of these memory blocks linked by pointers. 70 International Journal of Advanced Network Monitoring and Controls Volume 02, No.2, 2017 For the operating system, it has been allocated four equal-sized memory blocks for the application program. For example, on the fourth block of memory to enlarge, which contains a part of the memory pool information and three equal size memory pool units. The unit 1 and unit 3 are free, unit 2 has been allocated. When application program need to allocate a unit size of memory through the memory pool, only need a simple traversal of all pool size information, then locate quickly the free memory pool block unit. Then according to the size of the block position information directly locate the first free unit address, return the address and mark the next free unit; Marking directly the corresponding memory unit of the memory pool size information is free when the application program release a memory pool unit. Figure.1 The Working Principle of Memory Pool 2.2 Small object allocation technology Due to the application of memory block size of memory pool is uncertain, usually directly use the API of New and Malloc to apply for allocating memory. It is not effective for the small object allocation, when frequently used will cause a large amount of memory fragmentation and then reduce the performance, so the small object memory allocation technology[7] suitable for the small object memory allocation is used here. The size and number of blocks can be set in the construction period of the small object distributor. The Chunk layer contains logical information, it can configured and returned the block from memory. Once there is no free block in the Chunk layer, the function returns zero. Small Object Pool layer contains a vector, Chunk objects stored inside, the Chunk layer has been extended. There is a chunk queue, which stores all the information, there are two Chunk pointers, one pointing to the currently available Chunk, one pointing to the current with the release of a pointer. 2.3 Scene analysis of search engine system In this paper, analyzing the characteristics of the three scenes, Fixed length scene, Size is not fixed scene and Multiple allocation scene, designing the corresponding memory pool. 1) Fixed length scene In the existing search engine system, cache design takes advantage of the hash tables, Original system use the New and Delete functions for the allocation and release of each node of the hash table, and the size of the node is fixed, according to the allocation and release of the fixed size nodes, a memory pool is designed to improve the speed of cache allocation and release. A lot of places use the Map of STL[9] in Research on Optimization of memory pool management for high concurrent service requests 71 the present search engine system, and the allocation and release of memory of each node in the Map is managed by the distributor in the STL, take over the fixed node memory allocation and release by itself, enhance efficiency, easy to debug. Based on the above two scenarios, the common is that how to deal with node fixed size, design a small object dispenser to distribute and release the fixed size memory node. 2) Size is not fixed scene In the cache management of the currents earch system, the search results are put into the cache, which is helpful for the next search, the size of each node in the cache is uncertain, and the time to enter the cache and propose cache can not be estimated. In the update module of the current search system, which manages the update and delete of the document, but the size of the document and the time of the update is unknown. For this scene, it can design a recoverable variable length memory pool, the lock can be added to deal with base on the characteristics of cache multi thread[10]. 3) Multiple allocation scene The current search engine will return a result within 10M size after input a keyword, and a lot of information that comes with results will be allocated and released by using New and Delete function, it cause that the New function used frequently, and affect efficiency and bring memory fragments[11]. After analysis, the search engine return the result sat the same time, memory is frequently allocated, the number of release carried out only when the results of the query are returned, so the factor of frequent distribution should be considered, and the total capacity is not more than 10M, therefore, it is consider to allocate a large chunk of memory, after which all of the small memory is allocated, and finally released through the interface. Based on this scene design allocate not free memory pool. 3. Memory pool design and realize based on the high concurrency Three scenarios are obtained by analyzing the current search engines: Hash table insert delete, Cache update and document update module, Query result return. Three memory pools are designed for the three scenarios: Recoverable Fixed Length Memory Pool and Recoverable Variable Length Memory Pool and Allocate Not Free Memory Pool were designed. The design structure of the search system memory pool is shown in Figure 2. Figure.2 The Design Framework of Memory Pool of Search System 72 International Journal of Advanced Network Monitoring and Controls Volume 02, No.2, 2017 3.1 Recoverable Fixed Length Memory Pool Recoverable Fixed Length Memory Pool is Small object distributor, it divided into 4 layers structure. As shown in Figure 3, the bottom layer is the Chunk object, each Chunk manages a large chunk of memory, which contains an integer number of fixed size block. Chunk contains logical information, the user can configure and return the block according to it. When the Chunk is no longer remaining blocks, the configuration fails and returns to zero. The second layer is Fix Allocate, which base on the first layer, using the known vector to expand the first layer and ensure that the size of the distribution can be extended. The third layer is Small Object Allocator, which provide universal distribution and return function. The third layer expand base on the second layer, it provide multiple second layer objects, it make the fixed length of the distribution technology turn into a variable length distribution technology. The fourth layer is Small Object, It made a package for the third layer, which provides a number of generic interfaces for the third layer and some common interface, extend it into a multi thread available distributor. Through layer by layer expansion, not only to ensure the release efficiency of the distribution, but also to better package the internal structure together, it not visible to the outside. By providing a common interface, to make it used like the operating system comes with the default memory. Figure.3 The Structure of Small Object Distributor 3.2 Recoverable Variable Length Memory Pool Recovery variable length memory pool is a multi-threaded, variable length, recyclable memory pool, similar to hash table. A linked list indicates an assignable size range, each element in the list is a specific size of memory block pointer, which point to a list of memory blocks, to find specific head pointer by aligning, and then assign a node outside in the list. The elements in the range will be allocated through the New, when released, it will be returned to the pool for the next allocation, and beyond the range of elements also be allocated through New, but when released, it directly call Delete, and return to the operating system. About the factors of thread, add lock to ensure the thread safety after the specified by the constructor. Mainly includes Block Header layer, tragCtrlUnit layer and RecycleLitePool layer. The structure of the graph is shown in figure 4. Block Header layer is the bottom of the distribution structure, nCtrlIndex indicates the size of block Research on Optimization of memory pool management for high concurrent service requests 73 distribution, pNextBlock indicates the next block, the structure is linked list structure, the whole structure is Union type, which save space and improve efficiency. The tragCtrlUnit layer is a headpointer of each BlockHeader layer, and also contains a member that indicates the number of BlockHeader objects. RecycleLitePool layer contains the thread element, the lock element, thetagCtrlUnit layer pointer, some count elements and a memory distributor, default for the New distribution and delete function to delete. Figure.4 The Structure of Recyclelitepool 3.3 Allocate Not Free Memory Pool Allocate Not Free Memory Pool is divided into four layers: Memory Chunk layer: The bottom of the allocation block, there are three members inside, one indicates block size, one indicates location of initial address, one indicates currently available location. Chain of Memory Chunk layer: Memory Chunk object is organized into a two-way linked list. Simple Allocate Poilcy layer: This layer accept the request of distribution, change the size to be allocated not less than the size of Memory Chunk, and then added to the two-way linked list, the pointers of current distribution block point to the new block. StagePool layer: This layer is the outermost layer, The default template parameter is Simple AllocatePolicy type, which provides external interface for distribution and release. The overall structure of the StagePool contains a Chunk type pointer, which point to the currently allocated block, the allocation request are looked for from the current block every time, when the margin is not enough, it create a new block inserted into the list, select allocation strategy through the template parameter of Allocate Policy. The structure of the graph is shown in figure 5. 74 International Journal of Advanced Network Monitoring and Controls Volume 02, No.2, 2017 Figure.5 The Structure of Stagepool 4. Performance test and analysis Through the Centos operating system, the compiler and debugging tools of vim, g++ and scons, Some scenarios are designed to simulate the actual scene of the search engine to test the performance differences between the default memory distributor and the designed distributor. 4.1 Performance test and analysis of the small objects distributor 1) For the small objects allocator test, when the amount of data is 100000, by testing the system function of New/Delete and the memory pool interface function of Allocate/Deallocate. Record 5 groups of data, as shown in Table 1, by analyzing and calculating the time difference, the small objects allocator is increased by 70.20% relative to New of system, and compared with the Delete, it is increased by 2.29%. 2) For the thread adapter hash table test, using the Node Allocator class to match the hash table, construct an identical class of Default Allocator, the internal partition function use New/Delete to achieve, using template parameters to match a hash table. For the single threaded test: For Node Allocator and Default Allocator, to applicate and release 50000 block, assuming that the program is running a fixed time of 10s, during this time repeated inserted and delete operation; Multi-threaded test: For Node Allocator and Default Allocator, to open the same number of threads, and execute thread function, to insert and delete data for corresponding hash tables. Table 2 indicates the test data of hash table. By analyzing and calculating: In the case of Single threaded, in terms of distribution, the efficiency of Node is increased by 22.61%, for the release, the efficiency of Node is reduced by 18.94%; In the case of multiple threads, in terms of distribution, the efficiency of the Node is increased by 13.80%, for the release, the efficiency of Node is reduced by 1.20%. 3) For the test of the small objects allocator adapter map container, to achieve a Small Object Alloator type, internal distribution released is achieved by Small ObjectPool, through the Map function, fit Small ObjectAllocator to Map, Map node distribution and release call interface of Allocate/Deallocate of Small ObjectPool; Similarly to achieve a NewAllocator type, internal distribution release is achieved by Research on Optimization of memory pool management for high concurrent service requests 75 New/Delete interface, also mapping to the Map through the constructed function; To compare with distributor type provided by the system. Single threaded test: Three map were inserted into 100000 data; Multi threaded test: Map data is inserted and emptied by three Map circulation within a certain time. Table 3 indicates the test data of adapter Map. By analyzing and calculating: In the case of Single threaded, in terms of Default, the efficiency is increased by 48.30%, for the New, the efficiency is increased by 54.50%; In the case of multiple threads, in terms of Default, the efficiency is increased by 35.90%, for the New, the efficiency is increased by 33.10%. Table 1 Testing Data of the Small Objects Distributor(µs) Number of times New allocation time New release time Small allocation time Small release time 1 5318 2739 3174 2157 2 4662 1994 1271 1792 3 4856 2317 1350 1919 4 5093 3044 1381 1988 5 4899 1953 1253 1916 6 4768 2344 1329 3160 7 4835 2051 1743 1858 8 5852 2070 1388 1819 9 6412 2783 1328 1892 10 5119 2310 1271 1859 Average time 5181.4 2360.5 1548.8 2036 Table 2 Testing Data of Thread Adapter Hash Table(µs) Number Single thread Single thread Single thread Single thread Multi-threaded Multi-threaded Multi-threaded Multi-threaded of times default default node node ded default ded default ded node ded node allocation release allocation release allocation release allocation release 1 42 39 33 48 141 130 121 131 2 43 39 33 48 143 130 123 131 3 43 40 33 47 147 127 119 125 4 43 40 33 48 147 133 131 134 5 43 39 33 48 141 127 123 131 6 43 40 33 48 140 125 121 128 7 43 40 33 48 143 129 126 131 8 43 39 33 48 140 125 119 125 9 43 40 34 39 141 125 123 129 10 43 40 34 49 141 127 122 129 Average time 42.9 39.6 33.2 47.1 142.4 127.8 122.8 129.4 76 International Journal of Advanced Network Monitoring and Controls Volume 02, No.2, 2017 Table 3 Testing data of Adapter Map Multi Thread(µs) Number of times Single thread Default Single thread Small Single thread New Multi-threaded Default Multi-threaded Small Multi-threaded New 1 18 10 23 340 229 378 2 19 10 21 381 247 375 3 18 10 19 391 261 361 4 17 9 20 404 241 356 5 19 9 19 419 252 383 6 18 9 19 388 251 377 7 16 9 19 367 260 374 8 18 8 19 373 249 401 9 17 8 22 435 257 366 10 16 9 19 396 248 360 Average time 17.6 9.1 20 389.4 249.5 373.1 4.2 Performance test and analysis of recoverable variable length memory pool Given a set of arrays with assigned sizefrom 1-10000, 4 threads, 5000 insert delete action, then use a variable length memory pool to assign and storage an array of pointers, to release and reallocate, cycle 20 times to get the test data results; Using Malloc to open the corresponding bytes of memory and assigning to another pointer array to storage. Under the same conditions, compare the time of distribution and release of the system function. Table 4 is the test data, by analyzing and calculating: In terms of New,the efficiency of RecycleLitePool is increased by 13.84%. Table 4 Testing Data of Recoverable Variable Length Distributor(ms) Number of times New RecycleLitePool 1 183 177 2 185 157 3 181 154 4 197 164 5 177 153 6 183 154 7 183 156 8 182 154 9 183 159 10 181 153 Average time 183.5 158.1 Research on Optimization of memory pool management for high concurrent service requests 77 4.3 Performance test and analysis of allocate not free memory pool Building a new distributor structure, alloc is interface of the distributor, using the space of distributor to allocate 4 bytes every time, allocated 10000 times; As a contrast, the system call the New function to allocate 4 bytes each time, to record the time of 50000 application action. Table 5 is the test data, by analyzing and calculating: In terms of New, the efficiency of StagepPool is increased by 90.80%. Table 5 Testing Data of Allocate Not Free Distributor(ΜS) Number of times StagePool Newl 1 4 51 2 5 47 3 4 48 4 4 48 5 5 48 6 5 48 7 4 48 8 4 47 9 5 47 10 4 47 Average time 4.4 47.9 5. Conclusion 1)Three scenarios are obtained by analyzing the current search engines: Hash table insert delete, Cache update and document update module, Query result return. Three memory pools are designed for the three scenarios: Recoverable Fixed Length Memory Pool and Recoverable Variable Length Memory Pool and Allocate Not Free Memory Pool were designed. 2)Using thesystem default memory management function, malloc/free and new/delete. By analyzing of the various factors of the function. Allocating and freeing memory on the heap increases overhead. The design of the memory pool is applied to the search engine system. It optimize the internal memory management and improve the search speed. For the test of the three memory pool, Compared with the system’s default memory, its efficiency are increased by 70.20%, 13.84%, 90.80%. Sponsors or Supporters This paper is partially supported by Special research project of Shaanxi Provincial Department of Education “16JK1376”. Reference [1] DAI Chunyan, XU Zhiwen. Discussion About Malloc/Free and New/Delete in C++[J].Science&Technology 78 International Journal of Advanced Network Monitoring and Controls Volume 02, No.2, 2017 of Baotou Steel (Group)Corporation, 2009(35):59 [2] LI Qian, PAN Minxue, LI Xuandong. Benchmark of Tools for Memory Leak [J]. Journal of Frontiers of Computer Science and Technology.2010(01)29. [3] WANG Xiaoyin, CHEN Lijun. Implementation and Application of the Memory Pool in Linux Kernel [J]. Journal of Xi’an University of Posts and Telecommunications. 2011(04):40. [4] QU Weihua, WANG Qun. Introduce and Analyzing of Search Engine Principle [J]. Computer Knowledgeand Technology. 2006(06):113. [5] DUAN Bingying. Study and Design of Web Crawler in Search Engine [D]. Xidian University. 2014. [6] GUO Bingxuan, ZHANG Jingli, ZHANG Zhichao. Algorithm of Spatial Data Scheduling Based on Memory Pool [J]. Computer Engineering. 2008,34(06):63. [7] LIU Tao, NIE Xiaofeng, JING Jiwu, WANG Yuewu. Memory Management in Worm Simulation based on Small Object Memory Allocation Technique on The GTNetS [J]. Journal of Graduate University of Chinese Academy of Sciences. 2012,29(01):131. [8] GUO Xufeng, YU Fang,LIU Zhongli. An Efficient Memory Built-in Self-Repair Method Based on Hash Table [J]. Acta Electronica Sinica. 2013(07):1371. [9] LAI Xiangfang. Select The Appropriate STL Containers [J]. Digital Technology and Application. 2015(09):177. [10] Alexandrescu A.Modern C ++ design: Generic Programming and Design Patterns Applied [M]. Boston: Addison-Wesley Professional. 2001. [11] Robert W.P.Luka, Wai Lamb. Efficient In-Memory Extensible Inverted File [J]. Information Systems, 2007(32):733. AuthorBrief Liu Ping-Ping(1971-), female, Associate Professor, Xi’an Technological University, Research area: Artificial intelligence