Correct and stable sorting for overflow streaming data with a limited storage size and a uniprocessor


Correct and stable sorting for overflow
streaming data with a limited storage size
and a uniprocessor
Suluk Chaikhan, Suphakant Phimoltares and Chidchanok Lursinsap

Advanced Virtual and Intelligent Computing (AVIC) Research Center, Department of
Mathematics and Computer Science, Faculty of Science, Chulalongkorn University,
Bangkok, Thailand

ABSTRACT
Tremendous quantities of numeric data have been generated as streams in various
cyber ecosystems. Sorting is one of the most fundamental operations to gain
knowledge from data. However, due to size restrictions of data storage which
includes storage inside and outside CPU with respect to the massive streaming data
sources, data can obviously overflow the storage. Consequently, all classic sorting
algorithms of the past are incapable of obtaining a correct sorted sequence because
data to be sorted cannot be totally stored in the data storage. This paper proposes a
new sorting algorithm called streaming data sort for streaming data on a
uniprocessor constrained by a limited storage size and the correctness of the sorted
order. Data continuously flow into the storage as consecutive chunks with chunk
sizes less than the storage size. A theoretical analysis of the space bound and the time
complexity is provided. The sorting time complexity is O (n), where n is the number
of incoming data. The space complexity is O (M), where M is the storage size. The
experimental results show that streaming data sort can handle a million permuted
data by using a storage whose size is set as low as 35% of the data size. This proposed
concept can be practically applied to various applications in different fields where the
data always overflow the working storage and sorting process is needed.

Subjects Algorithms and Analysis of Algorithms, Artificial Intelligence, Data Science
Keywords Algorithms, Sorting, Memory, Algorithm design and analysis,
Computational intelligence

INTRODUCTIONS
Currently, the growth of data consumption by internet users has exponentially increased
(Laga et al., 2017; Bey Ahmed Khernache, Laga & Boukhobza, 2018), and a massive
storage size is required to store all incoming data to avoid any data loss in case of storage
overflow (Thusoo et al., 2010; Katal, Wazid & Goudar, 2013; Witayangkurn, Horanont &
Shibasaki, 2012; Mehmood et al., 2016). However, many applications such as data
management, finance, sensor networks, security-relevant data, and web search possibly face
this unexpected situation of a storage overload issue (Lee et al., 2016; Babcock et al., 2002;
Keim, Qu & Ma, 2013; Cardenas, Manadhata & Rajan, 2013; Dave & Gianey, 2016).
This issue induces the problem of representing big data with a limited storage size.
Furthermore, some primitive operations such as the classic sorting algorithms (e.g., quick

How to cite this article Chaikhan S, Phimoltares S, Lursinsap C. 2021. Correct and stable sorting for overflow streaming data with a limited
storage size and a uniprocessor. PeerJ Comput. Sci. 7:e355 DOI 10.7717/peerj-cs.355

Submitted 26 October 2020
Accepted 17 December 2020
Published 12 February 2021

Corresponding authors
Suluk Chaikhan,
suluk.c@student.chula.ac.th
Suphakant Phimoltares,
suphakant.p@chula.ac.th

Academic editor
Marcin Woźniak

Additional Information and
Declarations can be found on
page 24

DOI 10.7717/peerj-cs.355

Copyright
2021 Chaikhan et al.

Distributed under
Creative Commons CC-BY 4.0

http://dx.doi.org/10.7717/peerj-cs.355
mailto:suluk.�c@�student.�chula.�ac.�th
mailto:suphakant.�p@�chula.�ac.�th
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.355
http://www.creativecommons.org/licenses/by/4.0/
http://www.creativecommons.org/licenses/by/4.0/
https://peerj.com/computer-science/


sort, heap sort) cannot be implemented due to the restrictive constraint of storing all
sorted data inside the storage during the sorting process. A sorting algorithm is the first
important step of many algorithms (Cormen et al., 2009; Huang, Liu & Li, 2019) such as
searching and finding a closest pair (Singh & Sarmah, 2015; Tambouratzis, 1999).

Generally, when referring to data storage of a computer, it can be either primary storage
(internal storage) or secondary storage (external storage). The size of primary storage is
much smaller than that of the secondary storage. With reference to the size of storage,
there are two types of sorting: internal sort and external sort. All data to be sorted by an
internal sorting algorithm must be entirely stored inside the primary storage. Some of
traditional internal sorting algorithms are bubble sort, insertion sort, quick sort, merge
sort, and radix sort. However, if the data overflow the primary storage, the overflow must
be stored in the secondary storage. In this case, external sort algorithms can be employed.
Although these classic sorting algorithms are very efficient in terms of time and space
complexities, the actual quantity of data generated yearly on the internet has grown
tremendously faster than the growth rate of storage capacity based on the current
fabrication technology (for both primary storage and secondary storage). This severe
condition makes the classic sorting algorithms, where all data must be stored inside the
computer, very inefficient because all overflowed data are lost.

In this study, both internal and external storage are viewed as one unit of storage
with a limited size. This size is not gradually augmented during the sorting process of
continuously incoming data. The challenging problem to be studied is how to sort the data
under the constraints of limited storage capacity and storage overflow. The data are
assumed to flow into the storage as a sequence of data chunks with various sizes less than
or equal to the storage size.

Recently, many internal sorting algorithms have been remodeled by reducing
comparison, swapping, and the time complexity to reduce the sorting time. Farnoud,
Yaakobi & Bruck (2016) proposed an algorithm that sorts big data based on limited
internal storage, but the result is a partial sort. Concoms sort (Agrawal & Sriram, 2015) is
an algorithm that uses a swapping technique with no adjacent swapping. It reduces the
execution time in some cases when compared to selection sort and outperforms bubble
sort in every case. In particular, in the case that the input is a descending sequence,
Concoms sort is more efficient than both traditional algorithms. Mapping sort (Osama,
Omar & Badr, 2016) is a new algorithm that does not use comparisons and the swapping
technique but it uses the mapping technique instead. This algorithm achieved the worst
case time complexity of O(n) + O(n log n). Vignesh & Pradhan (2016) proposed a sorting
algorithm by improving merge sort. It uses multiple pivots to sort data. The execution
time of this algorithm is better than quick sort and merge sort in the best case and the
average case, respectively. In addition, proximity merge sort (Franceschini, 2004) was
proposed by improving the algorithm with an in-place property. Faro, Marino & Scafiti
(2020) modified insertion sort to reduce the time complacency by inserting multiple
elements for one iteration. The time complexity is Oðn1þ1hÞ, where h 2 N. Idrizi, Rustemi &
Dalipi (2017) modified the sorting algorithm by separating the data sequence into three

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 2/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


parts, namely, negative numbers, zero numbers, and positive numbers. After the data
in each part are sorted by printing the result, the algorithm can decrease the comparison by
a separating process. Bidirectional conditional insertion sort algorithm (Mohammed,
Amrahov & Çelebi, 2017) is a two-pivot insertion sort algorithm using the left comparator
and right comparator. It is faster than insertion sort, and the time complexity is nearly
close to O(n1.5). Brownian motus and clustered binary insertion sort methods (Goel &
Kumar, 2018) are algorithms that adapted insertion sort and binary insertion sort to
reduce the comparison and the execution time. Both algorithms are suitable for sorting
partial data. Internal sorting algorithms in the literature have focused on reducing the time
for processing, but the storage issue for big data has been ignored.

Presently, accessing a large piece of information or big data is simple because of rapid
technological advancements such as the cloud (Al-Fuqaha et al., 2015; Kehoe et al.,
2015; Vatrapu et al., 2016) and network technology (YiLiang & Zhenghong, 2016; Zhao,
Chang & Liu, 2017; Zhai, Zhang & Hu, 2018). One of the issues for sorting big data is the
restricted internal storage, which is usually smaller than the size of big data. All big
data cannot be stored in the internal storage. Therefore, the internal sorting algorithms
cannot sort big data at one time. The external sorting algorithms are developed from
the classic merge sorting algorithm to sort big data, which is separated into two phases:
(1) the sorting phase sorts a small chunk of big data in the internal storage. After sorting,
all sorted chunks are stored in the external storage and (2) the merging phase combines all
sorted chunks from the sorting phase into a single sorted list.

Recently, TaraByte sort (O’Malley, 2008) has used three Hadoop applications, namely,
TeraGen, TeraSort, and TeraValidate, to sort big data. This algorithm sorts 10 billion
data in 209 s. This process is very fast, but it is expensive because it requires many
processing units for sorting. Kanza & Yaari (2016) studied external sorting problems and
designed multi-insertion sort and SCS-Merge V1 to V3. The objective of these algorithms
is to decrease the write cost of intermediate results of sorting. Active sort (Gantz &
Reinsel, 2012) is an algorithm that merges sorted chunks inside SSDs and is applied with
Hadoop to reduce the number of reading and writing data. MONTRES (Laga et al., 2017),
the algorithm designed for SSDs, can reduce the read and write cost of I/O in a linear
function. External sorting algorithms in the literature have focused on reducing the
read and write cost in terms of the execution time for processing, but the storage
issue for keeping big data is still ignored. Liang et al. (2020) proposed a new algorithm,
namely, B*-sort, which was designed on NVRAM and applied on a binary search tree
structure.

In addition, Farnoud, Yaakobi & Bruck (2016) studied approximate sorting of streaming
permuted data with limited storage; however, the result is not exactly sorted data, and only
an approximate result is obtained when determining the data positions. Conversely,
the approximate positions of ordered data can be provided when using the values as inputs.
Elder & Goh (2018) studied permutation sorting by finite and infinite stacks. Although
all possible permutations cannot be sorted, the exact order and values can be obtained.
Let n be the total streaming numbers to be sorted and M ≪ n be the limited size of working

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 3/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


storage. Table 1 summarizes the efficiency of various classic sorting methods and our
proposed method (stream sort) in terms of seven characteristics: requiring extra storage,
preserving the input appearance order, time complexity, space complexity, sorting
streaming data, correct sorting order, and correct retrieved value by the sorted order.

This article proposes a new algorithm called streaming data sort for sorting streaming
data with limited storage size by using only a single central processing unit. The proposed
algorithm can correctly and stably handle a streaming data size of at least 2.857 times
larger than the size of the working storage. The following concerns are emphasized in
this study.

� All data must be in the exactly correct order after being sorted. No approximate and
partial ordering is allowed in this study.

� The time complexity of streaming data sort of all iterations is O(n).

CONSTRAINTS
In the stationary data environment, all classic sorting algorithms are based on the
assumption that all numbers to be sorted must be entirely stored in the working storage of
a computer during the sorting process. This implies that the whole data set cannot
exceed the working storage size during the sorting process. Figure 1 illustrates storage
constraint of the working storage in streaming data sort. However, in the streaming data
environment, the data continuously flow into the computer one chunk at a time, and the
number of incoming chunks is unknown in advance. If the size of the data chunk is
larger than the working storage size, then the overflow will be permanently discarded from
the computer. This makes the sorted result wrong. To make the study sufficiently feasible
for analysis and practice, the following constraints are imposed.

Table 1 Comparison of sorting algorithms on streaming data. n is the total streaming numbers to be sorted and M ≪ n is the limited size of
working storage.

Sorting algorithms Requiring
extra storage

Preserving input
appearance order

Time
complexity

Working space
complexity

Applicable to
streaming data

Correct
order

Correct
value

Bubble sort No Yes O(n2) O(n) No Yes Yes

Selection sort No No O(n2) O(n) No Yes Yes

Insertion sort No Yes O(n2) O(n) Yes Yes Yes

Quick sort No No O(n2) O(n) No Yes Yes

Merge sort Yes Yes O(n lg n) O(n) No Yes Yes

Heap sort No No O(n lg n) O(n) No Yes Yes

Permutation sort
(Farnoud, Yaakobi &
Bruck, 2016)

No No O(n/ω(log2 n)) O(n) No Yes Yes

Permutation sort
(Elder & Goh, 2018)

Yes Yes N/A O(n) Yes No No

External sorting Yes Yes N/A O(n) Yes Yes Yes

Streaming data sort No Yes O(n) O(M) Yes Yes Yes

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 4/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


1. The sorting process is performed by using only a fixed working storage of size M.
This working storage is for storing the incoming data, previously sorted data, and other
temporal data structures generated during the sorting process. No extra storage
module is added during this period. The proposed sorting algorithm and the operating
system are not stored in this working storage.

2. All numbers are integers. For floating numbers, they must first be transformed into
integers.

3. At any time t, the sizes of previously sorted data in a compact form and the size of next
incoming data chunk (h) must not exceed M.

4. The present incoming data chunk is completely discarded after being processed by the
proposed sorting algorithm.

5. Only four types of relation between any two temporal consecutive numbers di and di + 1
are studied in this paper. The details and rationale of concentrating on these four types
will be elaborated later.

Figure 1 Storage constraint. Case 1 for D ≤ M where all data must be in the storage. Case 2 for D ≫
M and D ≤ M + E where data overflow the storage. Case 3 for mwork = M + E, the constraint of this
study. Full-size DOI: 10.7717/peerj-cs.355/fig-1

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 5/27

http://dx.doi.org/10.7717/peerj-cs.355/fig-1
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


The second constraint is the main concern of this study. After sorting the first incoming
data chunk, all numbers are captured in a compact form and all sorted numbers are
completely discarded. This compact form is used in conjunction with the next incoming
data chunk for sorting. To avoid a storage overflow obstruction, the fourth constraint
must be presented. The last constraint is derived from real-world data sets. From the
observation of real-world streaming data sets from the UCI Repository (Dua & Graff,
2019) such as the census income, diabetes 130-US hospitals, incident management process
event log, PM2.5 of five Chinese cities, KEGG metabolic relation network, Beijing
multi-site air quality, and Buzz in social media, it is remarkable that most of the different
values between two temporal consecutive numbers are between 0.38 and 2.98 on average.
Hence, only four types of relations between any two temporal consecutive numbers are
the focus. The definition of each type will be given in the next section.

DEFINITIONS AND NOTATIONS
Definition 1 The window at time t, denoted by W(t) = (d1, d2, …, dh), is a sequence of h ≤ M
incoming numeric data at time t.
Definition 2 The sorted window of W(t) at time t, denoted by W(t) = (w1, w2, …, wh | wi = dj,
wi + 1 = dk and 8wi; wiþ1 2 WðtÞ : wi < wiþ1Þ, is a sequence of increasingly sorted numeric
data of W(t).
Definition 3 Type-1 subsequence T1 = (wi, …, wi + l) � WðtÞ is a sequence such that ∀ wi,
wi + 1 ∈ T1: |wi − wi + 1| = 1.

An example of a Type-1 sequence is (1, 2, 3, 4, 5). The different value between any two
adjacent numbers is equal to 1, namely, (|1 − 2|,|2 − 3|,|3 − 4|,|4 − 5|) = (1,1,1,1).
Definition 4 Type-2 subsequence T2 = (wi, …,wi+l) ⊆W

(t) is a sequence such that ∀wi+a,
wi+a+1 ∈ T2, 0 ≤ a ≤ l−1 : |wi+a−wi+a+1| = 1 when a is even and |wi+a−wi+a+1| = 2 when a
is odd.

An example of a Type-2 sequence is (4, 5, 7, 8, 10). The different value between any two
adjacent numbers is equal to either 1 or 2, namely, (|4–5|, |5–7|, |7–8|, |8–10|) = (1, 2, 1, 2).
Definition 5 Type-3 subsequence T3 = (wi, …,wi+l) ⊆W

(t) is a sequence such that ∀wi+a,
wi+a+1 ∈ T3, 0 ≤ a ≤ l−1 : |wi+a−wi+a+1| = 2 when a is even and |wi+a−wi+a+1| = 1 when a is odd.

An example of a Type-3 sequence is (5, 7, 8, 10, 11). The different value between any two
adjacent numbers is equal to either 1 or 2, namely, (|5–7|, |7–8|, |8–10|, |10–11|) = (2, 1, 2, 1).
Definition 6 Type-4 subsequence T4 = (wi,…,wi+l) ⊆ W

(t) is a sequence such that ∀wi,wi+1 ∈
T4: |wi−wi+1| = 2.

An example of a Type-4 sequence is (8, 10, 12, 14, 16). The different value between any
two adjacent numbers is equal to either 1 or 2, namely, (|8–10|, |10–12|, |12–14|, |14–16|) =
(2, 2, 2, 2).

During the sorting process by the proposed algorithm, it is necessary to identify the type
of subsequence to be sorted first. Given a subsequence (wi,…,wi+l) ∈W

(t), the type of this
subsequence can be easily identified as type-p by setting

p ¼ wiþ2 þ wiþ1−2ðwi þ 1Þ (1)

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 6/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Each already sorted subsequence (wi, …, wi + l) ∈ W
(t) is compactly written in a form

of (u, v)(p) where u = wi and v = wi + l are used during the sorting process to minimize
the storage use. (u, v)(p) is named compact group p. Any numeric data in between u and v
are called removed data. These removed data are not considered and can be removed
after the sorting process. For example, subsequence (1, 2, 3, 4, 5) is compacted as (1, 5)(1);
(4, 5, 7, 8, 10) is compacted as (4, 10)(2); (5, 7, 8, 10) is compacted as (5, 10)(3); and
(8, 10, 12, 14) is compacted as (8, 14)(4).

Note that a sequence W(t) may contain several compact groups and some single
numbers. Suppose W(t) = (1, 2, 3, 4, 5, 7, 8, 10, 12, 14, 19). This sequence consists
of the following subsequences (1, 5)(1), (8, 14)(4). Thus, W(t) can be rewritten in
another form of compact groups and a set of single numbers as W(t) = ((1, 5)(1), 7,
(8, 14)(4), 19). However, it is possible to find another set of compact groups from W(t) as

WðtÞ¼ ððð1; 3Þð1Þ; ð4; 7Þð2Þ; ð8; 14Þð4Þ; 19Þ. Obviously, different sets of compact groups for
any W(t) use different storage sizes to store them.

To distinguish between W(t) written in the original sequence of numbers and W(t)

written in a form of compact groups having a set of single numbers, the notation Q(t) is
used instead of W(t) to denote a combination set of compact groups and single numbers.
Each compact group i in Q(t) is denoted by qi. In fact, either each compact group or a
single number in Q(t) can be considered as an element of Q(t). For example, if W(t) = (1, 2, 3,
4, 5, 7, 8, 10, 12, 14, 19), then Q(t) = ((1, 5)(1), 7, (8, 14)(4), 19) such that q1 = (1, 5)

(1),
q2 = (8, 14)

(4). All removed data of compact group (u, v)(p) will be occasionally retrieved to
obtain a complete sorted subsequence in order to involve the new incoming subsequence in
the sorting process. Hence, each retrieved number is denoted by ri to make it different
from each input number wi during the sorting process. The retrieved sequence of (u, v)

(p),
denoted R((u, v)(p)), can be obtained by using the following rules.

r1 ¼ u (2)
r1þl ¼ v (3)

riþ1 ¼
(

ri þ 1 for p ¼ 1
ri þ ðri � r1 þ p � 1Þ mod 3 for p ¼ 2; 3
ri þ 2 for p ¼ 4

(4)

To illustrate how to retrieve all numbers from a compact group, consider an example
of sequence (5, 7, 8, 10) represented by the compact group (5, 10)(3). The retrieved
numbers of (5, 10)(3) can be computed as follows: Since p = 3, r1 = 5, r2 = (5) + ((5) − (5) +
(3) − 1) mod 3 = 7, r3 = (7) + ((7) − (5) + (3) − 1) mod 3 = 8, r4 = (8) + ((8) − (5) + (3) − 1)
mod 3 = 10, r4 = 10 = v. Accordingly, R ð5; 10Þð3Þ

� �
¼ ð5; 7; 8; 10Þ.

CONCEPTS
The size of each incoming sequence is assumed to be at most the size of working storage.
To make the working storage available for storing the next incoming data chunk after
sorting the current chunk, it is required to represent some sorted subsequent numbers in a
form of a compact group. However, not all subsequent sorted numbers can be compacted.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 7/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


The properties and concept of a compact group representation will be discussed next. The
sorting process consists of the following three main steps.

1. Transform the initial incoming sequence W(1) into a set of compact groups qi ∈ Q
(1).

2. At time t, obtain the next incoming sequence and insert each number wi ∈ W
(t) into the

previous Q(t − 1) at the appropriate position.

3. If there exist any adjacent compact groups qi = (a,b)
(a) and qi + 1 = (c,d)

(β) such that the
retrieved sequences R((a,b)(a)) and R((c,d)(β)) satisfy one of the types of subsequences,
then form a new compact group from the sequences of R((a,b)(a)) and R((c,d)(β)).

Steps 2 and 3 are iterated until there are no more incoming sequences. The details of
each step will be discussed next. Figure 2 shows an example of how the proposed
approximate sorting works. The storage size |mtot| is 10. The first incoming 10-number
sequence, that is, (18, 1, 10, 6, 2, 12, 9, 3, 16, 19), fills the whole storage. This sequence is
sorted in an ascending order and forms a set Q(1) = ((1, 3)(1), 6, (9, 12)(2), (16, 19)(3)),
as shown in Fig. 2A. The size of the storage used is decreased to 7. The second incoming
sequence (14, 11, 17) is inserted into some compact groups in Q(1) to obtain Q(2) = ((1, 3)(1),
6, (9, 12)(1), 14, (16, 19)(1)), as shown in Fig. 2B. The size of the storage used after the second
incoming sequence is increased to 8. The third incoming sequence (13, 20) is separately
grouped with (9, 12)(1) and (16, 19)(1) from the previous Q(2) to make a new Q(3) = ((1, 3)(1),
6, (9, 13)(1), 14, (16, 20)(1)). Observe that the compact group (9, 13)(1) can be grouped
with the single number 14 to make (9, 14)(1). Therefore, Q(3) = ((1, 3)(1), 6, (9, 14)(1),
(16, 20)(1)). The fourth incoming sequence (8, 4, 15) is possibly and separately grouped with
(9, 14)(1), (1, 3)(1), and (16, 19)(1) in Q(3) to obtain Q(4) = ((1, 4)(1), 6, (8, 20)(1)). The last
incoming sequence (5, 7) is possibly and separately grouped with (1, 4)(1), 6, (8, 20)(1) in Q(4)

to obtain Q(5) = ((1, 20)(1)).

PROPOSED ALGORITHM
The proposed sorting algorithm is composed of the following two major steps. These steps
are based on the constraints previously imposed in Constraints Section.

1. Obtain the first input number sequence and sort the number in an ascending order.
Then, create Q(1), a set of compact groups and a set of a single number.

2. At time t, obtain the next set of number sequences and insert the numbers into Q(t − 1) to
create the next Q(t).

3. Repeat step 2 until there are no more new incoming sequences.

The deils of steps 1 and 2 will be discussed in the following sections.

Creating compact groups
There are four types of compact groups. To identify the type of compact group from a
number sequence, four counters c1, c2, c3, and c4 for type-1, type-2, type-3, and type-4,
respectively, are employed. Let s(i) be the status condition of type-i. The value of s(i) is
defined as follows.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 8/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Definition 7 Type-1 status condition s(1) of a datum wk + i and its neighbors in a type-1
subsequence wk,…,wk + i,…,wk + m, where m < h is a constant defined by:

sð1Þ ¼ 1 wkþi � wkþi�1 ¼ 1 for 0 � i � m
0 otherwise:

�

Definition 8 Type-2 status condition s(2) of a datum wk + i and its neighbors in a type-2
subsequence wk,…,wk + i,…,wk + m, where m < h is a constant defined by:

sð2Þ ¼ 1 ðwkþi�1 � wkÞ mod 3 þ 1 ¼ wkþi � wkþi�1 for 0 � i � m
0 otherwise:

�

Figure 2 An example of streaming data sort. The sorting steps are illustrated in subfigures (A), (B),
(C), (D) and (E). Full-size DOI: 10.7717/peerj-cs.355/fig-2

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 9/27

http://dx.doi.org/10.7717/peerj-cs.355/fig-2
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Definition 9 Type-3 status condition s(3) of a datum wk + i and its neighbors in a type-3
subsequence wk,…,wk + i,…,wk + m, where m<h is a constant defined by:

sð3Þ ¼ 1 ðwkþi�1 � wk þ 2Þ mod 3 ¼ wkþi � wkþi�1 for 0 � i � m
0 otherwise:

�

Definition 10 Type-4 status condition s(4) of a datum wk + i and its neighbors in a type-4
subsequence wk,…,wk + i,…,wk + m, where m < h is a constant defined by:

sð4Þ ¼ 1 wkþi � wkþi�1 ¼ 2 for 0 � i � m
0 otherwise:

�

The notations in this paper are given in Table 2.
Q(t)||S|| C denotes orderly concatenating Q(t), S, C according to the sorted order of all

elements in W(t). The quantity of removed data of type-1 is greater than those of the
other types. The difference of the first and the last data of a type-4 compact group is larger
than the differences in the other types. To greatly reduce and control the storage size,
the sequences of types 1 and 4 are detected before the sequences of types 2 and 3.
Suppose the following sequence (1, 3, 4, 5, 6) is given. If types 1 and 4 are considered before
types 2 and 3, then the given sequence is compacted as 1, (3, 6)(1), which requires 4 units
of storage to store numbers 1, 3, 6, 1. However, if types 2 and 3 are considered before
types 1 and 4, then the given sequence is compacted as (1, 4)(2), 5, 6, which requires 5 units
of storage to store numbers 1, 4, 2, 5, 6.

Theorem 1 If p ¼ arg max
1�i�4

ðciÞ, then p denotes the correct type of the compact group.
Proof: Suppose the sorted sequence is W(t) = (w1, w2, …, wh). We consider each type of
compact group. Let s(i)t be the status condition of type-i at time t and T

(i) = (s(i)1 , s
(i)
2 , …, s

(i)
h )

be the sequence of s(i)t . There are four cases to be investigated.

Table 2 Notations in streaming data sort.

Notations Short definitions Examples

di The i
th incoming datum −3, 0, 10

(d1,d2,d3,…) Sequence of streaming data (−3, 0, 10,…)

h Window size at iteration t 5, 0, 4, 1

wi The i
th member in a window 18, 1, 10

W(t) Unsorted window at iteration t (18, 1, 10, 6,…)
W(t) Sorted window at iteration t (1, 6, 10, 18,…)
p Type of sub-sequence 1, 2, 3, 4

Tp Type-p sub-sequence (2, 4, 6, 8, 10, 12)

(u, v)(p) Type-p compact group (2, 12)(3)

ri The i
th retrieved number 2, 4, 6, 8, 10, 12

R((u, v)(p)) Retrieved sequence of (u, v)(p) (2, 4, 6, 8, 10, 12)

s(i) Status condition of type-i 0, 1

qi The i
th compact group (1, 5)(4), (9, 12)(1)

S Set of single numbers {6, 7}

C Set of compact groups {(1, 5)(4), (9, 12)(1)}

Q(t) Combining set of C and S at iteration t {(1, 5)(4), 6, 7, (9, 12)(1)}

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 10/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Case 1 (type-1): Suppose the sorted sequence W(t) = (w1, w2, …, wh) is in type-1. Then,
we have the following four sequences of the status condition.

Tð1Þ ¼ sð1Þ1 ¼ 0; sð1Þ2 ¼ 0; sð1Þ3 ¼ 1; . . . ; sð1Þh ¼ 1
� �

or ð0; 0; 1; . . . ; 1Þ

Tð2Þ ¼ sð2Þ1 ¼ 0; sð2Þ2 ¼ 0; sð2Þ3 ¼ 0; . . . ; sð2Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð3Þ ¼ sð3Þ1 ¼ 0; sð3Þ2 ¼ 0; sð3Þ3 ¼ 0; . . . ; sð3Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð4Þ ¼ sð4Þ1 ¼ 0; sð4Þ2 ¼ 0; sð4Þ3 ¼ 0; . . . ; sð4Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Obviously, the value of c1 ¼
Ph

t¼1 s
ð1Þ
t is larger than that of c2, c3, and c4.

Case 2 (type-2): Suppose the sorted sequence W(t) = (w1, w2, …, wh) is in type-2. Then,
we have the following four sequences of the status condition.

Tð1Þ ¼ sð1Þ1 ¼ 0; sð1Þ2 ¼ 0; sð1Þ3 ¼ 0; . . . ; sð1Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð2Þ ¼ sð2Þ1 ¼ 0; sð2Þ2 ¼ 0; sð2Þ3 ¼ 1; . . . ; sð2Þh ¼ 1
� �

or ð0; 0; 1; . . . ; 1Þ

Tð3Þ ¼ sð3Þ1 ¼ 0; sð3Þ2 ¼ 0; sð3Þ3 ¼ 0; . . . ; sð3Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð4Þ ¼ sð4Þ1 ¼ 0; sð4Þ2 ¼ 0; sð4Þ3 ¼ 0; . . . ; sð4Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Obviously, the value of c2 ¼
Ph

t¼1 s
ð2Þ
t is larger than that of c1, c3, and c4.

Case 3 (type-3): Suppose the sorted sequence W(t) = (w1, w2, …, wh) is in type-3. Then,
we have the following four sequences of the status condition.

Tð1Þ ¼ sð1Þ1 ¼ 0; sð1Þ2 ¼ 0; sð1Þ3 ¼ 0; . . . ; sð1Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð2Þ ¼ sð2Þ1 ¼ 0; sð2Þ2 ¼ 0; sð2Þ3 ¼ 0; . . . ; sð2Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð3Þ ¼ sð3Þ1 ¼ 0; sð3Þ2 ¼ 0; sð3Þ3 ¼ 1; . . . ; sð3Þh ¼ 1
� �

or ð0; 0; 1; . . . ; 1Þ

Tð4Þ ¼ sð4Þ1 ¼ 0; sð4Þ2 ¼ 0; sð4Þ3 ¼ 0; . . . ; sð4Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Obviously, the value of c3 ¼
Ph

t¼1 s
ð3Þ
t is larger than that of c1, c2, and c4.

Case 4 (type-4): Suppose the sorted sequence W(t) = (w1, w2, …, wh) is in type-4. Then,
we have the following four sequences of the status condition.

Tð1Þ ¼ sð1Þ1 ¼ 0; sð1Þ2 ¼ 0; sð1Þ3 ¼ 0; . . . ; sð1Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð2Þ ¼ sð2Þ1 ¼ 0; sð2Þ2 ¼ 0; sð2Þ3 ¼ 0; . . . ; sð2Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð3Þ ¼ sð3Þ1 ¼ 0; sð3Þ2 ¼ 2; sð3Þ3 ¼ 0; . . . ; sð3Þh ¼ 0
� �

or ð0; 0; 0; . . . ; 0Þ

Tð4Þ ¼ sð4Þ1 ¼ 0; sð4Þ2 ¼ 0; sð4Þ3 ¼ 1; . . . ; sð4Þh ¼ 1
� �

or ð0; 0; 1; . . . ; 1Þ

Obviously, the value of c4 ¼
Ph

t¼1 s
ð4Þ
t is larger than that of c1, c2, and c3.▪

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 11/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Inserting numbers into the combination set of compact groups
After creating the first combination set of compact group Q(1) and obtaining a new
incoming sequence, the current compact groups must be updated according to the number
in the incoming sequence. There are seven possible cases where a new incoming number
can be inserted into any compact group or in between a compact group and a single
number. Let the a new incoming number da is located according to each case as follows.
Q(t) is the set of combinations of compact groups and a set of single numbers at time t.

Case 1: da is at the front of Q
(t). Case 2: da is at the rear of Q

(t). Case 3: da is in a
compact group (uj, vj)

(p). Case 4: da is in between two compact groups (uj, vj)
(pj) and

(uk, vk)
(pk). Case 5: da is in between a single number wj and a compact group (uk, vk)

(pk).
Case 6: da is in between a compact group (uj, vj)

(pj) and a single number wk. Case 7: da is in
between two single numbers wj and wj + 1.

The details of each case and the insertion steps are in given in Algorithm 2.

EXPERIMENTAL RESULTS AND DISCUSSION
Three issues are discussed in this section. The first issue illustrates the snapshot of sorting
outcomes as the results of incoming data chunks, current compact groups of different
types, and sets of single numbers. The second issue discusses the relation between the
sorting time and the number of streaming numbers. The third issue shows how the size of
working storage changes during the sorting process.

Sorting examples
The proposed algorithms are implemented in MATLAB R2016a. The computing results are
run on 3.4 GHz Intel Core i7 6700 and 16 GB of 2400 MHz RAM with the Windows 10
platform. To illustrate how the proposed algorithm works, three experiments were
conducted by using a set of 100 single integers ranging from 1 to 100. These 100 numbers
were randomly permuted to produce three different experimental data sets. The total size of
storage is assumed to have only 60 working addresses. Forty of them are used for storing
temporary data generated during the sorting process, which includes WðtÞ, W(t), and Q(t) at
different times. The rest of storage is for storing some variables in the sorting program.

To illustrate the continuous results during the sorting process, three data sets in the
experiment were generated from three permutations of integer numbers from 1 to 100 to
avoid any duplication. These permuted numbers are sliced into a set of input chunks of at
most 40 numbers in each chunk. Let mtotj j be the total size of the working storage, which
is equal to 60 in the experiment. The experimental results are shown in Fig. 3, where the
x-axis represents each wi in W

(t) and Q(t) and the y-axis represents the time line of iterations.
Each datum wi is represented by ×. Each type of compact group in Q

(t) is denoted by a solid
line with a specific color as follows.

Type-1 is denoted by gray line.

Type-2 is denoted by blue line.

Type-3 is denoted by green line.

Type-4 is denoted by red line.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 12/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


A single number in the compact window is represented by •.
In the first data set, there are 40 numbers entering the process at the starting time. After

being grouped by Algorithm 1, the result appears in time step t = 1 (at the line above
the bottom line in Fig. 3A. There are four compact groups of type-1, two compact groups
of type-2, three compact groups of type-4, and nine single numbers. Also, at time t = 1,
there are four new incoming numbers, each of which is represented by ×. Algorithm 1 sorts

Algorithm 1 Creating compact groups.

Input: a sorted sequence W(t) = (wk, wk+1, …, wk+h) of length h at time t.

Output: a combination of a set of compact groups and a set of single numbers.

1. j = k.

2. S = Ø. /* set of single numbers */

3. C = Ø. /* set of compact groups */

4. Q(t) = Ø.

5. For l = 1 to 2 do /* packing order types 1, 4 before 2, 3 */

6. c1 = c2 = c3 = c4 = 0.

7. If |wk −wk+1| > 2 then

8. S = S∪{wk}.

9. j = k+1.

10. EndIf

11. For i = k+1 to k+h−1 do

12. If |wi−wi+1| ≤ 2 then

13. Set the values of s(1), s(2), s(3), s(4) by definitions 7–10.

14. cl = cl +s
(l).

15. c5−l = c5−l +s
(5−l).

16. else

17. If j = i then

18. S = S∪{wj}. /* single number */

19. j = i +1.

20. else

21. p = arg maxi2fl;5−lg (ci). /* compact types */

22. Create a compact group (wj, wi)
(p).

23. C = C∪{(wj, wi)
(p)}.

24. c1 = c2 = c3 = c4 = 0.

25. j = i + 1.

26. EndIf

27. EndIf

28. Q(t) = Q(t)||S||C.

29. EndFor

30. EndFor

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 13/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Algorithm 2 Inserting dα into Q(t).

Input: (1) set Q(t). (2) a new number da.

Output: a new Q(t+1).

1. Identify the case of insertion for da.

2. Case:

3. 1: If the first element of Q(t) is (u1, v1)
(p1)

then

4. Let U be the retrieved (u1, v1)
(p1) by using Eqs. (2), (3) and (4).

5. else

6. Put d1 in U.

7. EndIf

8. Use Algorithm 1 with da and U to generate a new set of elements.

9. EndCase

10. 2: If the last element of Q(t) is (um, vm)
(pm)

then

11. Let U be the retrieved (um, vm)
(pm) by using Eqs. (2), (3) and (4).

12. else

13. Put dm in U.

14. EndIf

15. Use Algorithm 1 with da and U to generate a new set of elements.

16. EndCase

17. 3: Let U be the retrieved (um, vm)
(pm) by using Eqs. (2), (3) and (4).

18. Use Algorithm 1 with da and U to generate a new set of elements.

19. EndCase

20. 4: Let U be the retrieved (uj, vj)
(pj) by using Eqs. (2), (3) and (4).

21. Let V be the retrieved (uk, vk)
(pk) by using Eqs. (2), (3) and (4).

22. Use Algorithm 1 with da, U and V to generate a new set of elements.

23. EndCase

24. 5: Let U be the retrieved (uk, vk)
(pk) by using Eqs. (2), (3) and (4).

25. Use Algorithm 1 with da, wj and U to generate a new set of elements.

26. EndCase

27. 6: Let U be the retrieved (uj, vj)
(pj) by using Eqs. (2), (3) and (4).

28. Use Algorithm 1 with da, U and wk to generate a new set of elements.

29. EndCase

30. 7: Use Algorithm 1 with da, wj and wk to generate a new set of elements.

31. EndCase

32. Repeat

33. Use Algorithm 1 with the new set of elements and the unpacked element next to the new set
next to the new set of elements to generate the next new set of elements in Q(t).

34. Until no more new elements.

35. Rename Q(t) as Q(t+1).

36. EndCase

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 14/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


all incoming 100 numbers appearing in various chunks within only 19 time steps, whereas
it takes 21 and 18 time steps for the second and the last data sets, respectively.

Execution time vs working storage size
In this experiment, the relation between the total numbers to be sorted and the sorting
time was investigated. The total storage, mtot, is partitioned into two portions. The first
portion, mprog, is for the sorting program. The size of mprog is fixed throughout the sorting
process. The second portion, mwork, is the working storage for storing all compact groups,
sets of single numbers, and other relevant variables occurring during in the sorting
algorithm. Since the sorting time directly depends upon the size of mwork, the size of mwork
is thus set as a function of the total numbers to be sorted. Let n ≫|mwork| be the total

Figure 3 Snapshots of sorting results from three different permuted data sets, each of which contains
100 numbers. (A) The time steps of the sorting result of data set 1. (B) The time steps of the sorting
result of data set 2. (C) The time steps of the sorting result of data set 3.

Full-size DOI: 10.7717/peerj-cs.355/fig-3

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 15/27

http://dx.doi.org/10.7717/peerj-cs.355/fig-3
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


numbers to be sorted. All numbers to be sorted flow gradually and continuously into the
working storage one chunk at an initial time. To investigate the execution time of the
sorting process with respect to the quantity of numbers and |mwork|, the size of mwork is set
in terms of n as follows.

jmworkj ¼ g � n (5)
where γ ∈ {0.50, 0.45, 0.40, 0.35} and n ∈ {103, 104, 105, 106}.

Table 3 summarizes the proposed sorting algorithm time of different quantities of
incoming numbers with respect to the different sizes of the working memory. The incoming
numbers were randomly generated and permuted. No duplicated numbers appear in the
data sets. To visualize the trend of the sorting time vs the size of data sets, Fig. 4 shows the
log-scaled trend of each data set. There are four lines in blue, red, yellow, and purple
representing different sizes of mwork. Note that the sorting time of each data set linearly

Figure 4 Log-scaled sorting execution time in seconds for different sizes of working storage n = 103,
104, 105 and 106. Full-size DOI: 10.7717/peerj-cs.355/fig-4

Table 3 Sorting execution time of the proposed algorithm with respect to size of working storage.

n Execution time (s)

γ = 50% γ = 45% γ = 40% γ = 35%

103 0.87 0.98 1.16 1.30

104 7.15 8.08 9.33 11.02

105 71.06 79.17 92.80 109.88

106 709.49 791.17 933.80 1,065.71

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 16/27

http://dx.doi.org/10.7717/peerj-cs.355/fig-4
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


increases. Then, the experiment has a linear polynomial time complexity of O(n). Table 4
summarizes the time of external sorting of different quantities of incoming numbers with
respect to the different sizes of buffer. The execution time of proposed sorting algorithm is
approximately 8.72 times faster than the execution time of external sorting at a million
data when storage size is limited to 50%. Furthermore, the proposed algorithm run on 4 GB
of numeric data takes about 4.21 days.

Fluctuation of compact groups and single number sets
Since the proposed sorting algorithm is designed to cope with a streaming data
environment where the set of numbers to be sorted can overflow the working storage and
the chunks of numbers gradually flow into the working storage, there are three interesting
behavioral periods concerning the number of compact groups and sets of single
numbers created during the sorting process. It is remarkable that the number of compact
groups and sets of single numbers increase during the beginning period due to random
values of incoming numbers. The length of beginning period depends upon the random
sequence of numbers, which is unpredictable. After the beginning period, some new
incoming numbers may fall into the existing compact groups and some of them may form
new compact groups with some sets of single numbers. Some existing compact groups can be
merged with new compact groups created from some sets of single numbers into
new compact groups. These conditions make the number of compact groups almost stable
for some period of time. In the last period, those new incoming numbers obviously fall
to combine with the existing compact groups. Some sequences of compact groups are
possibly merged into new compact groups with more elements in the groups. Thus, the
number of compact groups decreases until there is one compact group that contains all
sorted numbers. Figure 5 illustrates the fluctuation of compact groups with sets of single
numbers vs the time steps for different sizes of working storage. During the sorting process,
the number of compact groups and sets of single numbers increases and decreases. The
fluctuation of used and unused areas of working storage of the results in Fig. 5 is summarized
in Fig. 6. Notice that the proposed algorithm can reduce the working space to 65% of the data
size. In the other words, the working space of the proposed algorithm is 35% of the data size.

Comparison of storage size used and correctness of sorted order
Regardless of the sorting types, either exact sort or approximate sort, the order of each
number in the sorted list must be correct according to the value of each number for

Table 4 Sorting execution time of external sorting with respect to buffer size.

n Execution time (s)

Buffer = 50% Buffer = 45% Buffer = 40% Buffer = 35%

103 2.30 2.69 2.64 2.66

104 26.95 21.04 20.19 20.18

105 209.41 207.27 206.87 205.15

106 6,187.14 5,312.09 4,754.30 4,539.04

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 17/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Figure 6 Snapshot of fluctuation of unused area of working storage at a million data during the
sorting period for different sizes of working storage. Full-size DOI: 10.7717/peerj-cs.355/fig-6

Figure 5 Snapshot of fluctuation of compact groups and sets of single numbers at a million data
during the sorting period for different sizes of working storage.

Full-size DOI: 10.7717/peerj-cs.355/fig-5

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 18/27

http://dx.doi.org/10.7717/peerj-cs.355/fig-6
http://dx.doi.org/10.7717/peerj-cs.355/fig-5
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


both ascending and descending sorts. If it is not so, the sorted list is useless in any
applications. To verify the efficiency and the accurate order of sorted numbers as the result
of proposed streaming data sort in a streaming data environment with limited storage size,
the result was compared with the result of approximate sorting algorithm (Farnoud,
Yaakobi & Bruck, 2016) capable of handling streaming data, and external sorting. The
following set of numbers was experimented: {18, 1, 10, 6, 2, 12, 9, 3, 16, 19, 14, 11, 17, 13,
20, 8, 4, 15, 5, 7}. In order to simulate streaming data, the set of numbers was decomposed
into several consecutive chunks. The first incoming chunk contains nine numbers.
The other following chunks contain only one number. Two issues concerning the change
of storage size during the sorting process and the wrong sorted order were recorded in the
experiment. Since streaming data sort algorithm uses only one working storage of fixed
size throughout the sorting process, there is no change of storage size for this algorithm.
But in case of approximate sorting and external sorting algorithms, both of them require
working storage of fixed size and also external storage of variable size. Hence, the
change of storage size only occurs in the external storage. Figure 7 snapshots the storage
size change at each time. mwork is a constant denoting the fixed size of working storage.
The size of external storage is expandable according to the amount of temporary data
generated during the sorting algorithms.

It is remarkable that the proposed streaming data sort does not require any space in the
external storage. Only working storage space alone is enough to complete the sorting

Figure 7 Comparison of storage size change as the results of sorting a set of streaming data by the
proposed streaming data sort algorithm, approximate sorting (Farnoud, Yaakobi & Bruck, 2016)
algorithm and external sorting. Full-size DOI: 10.7717/peerj-cs.355/fig-7

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 19/27

http://dx.doi.org/10.7717/peerj-cs.355/fig-7
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


process. But both approximate sorting and external sorting need some additional spaces
in the external storage. These spaces keep increasing when there are more incoming
numbers to be sorted. The sorted order of all numbers obtained from streaming data sort is
perfectly correct. But the sorted orders of numbers 7, 8, 13, 14 obtained from approximate
sorting are not correct. Although the sorted order obtained from external sorting is
perfectly correct, this algorithm requires a large size of external storage which is
impractical for streaming data environment.

Time complexity analysis
There are two main phases in the sorting process. The first phase is to sort the first
incoming chunk of numbers to obtain the first set of compact groups as well as sets of
single numbers. The second phase is to sort the consequent chunks with the existing
compact groups and single numbers. Let h ≤ |mwork| be the size of each input chunk.
The time complexity of each phase is as follows.

Phase 1: The operation of this phase is in Algorithm 1. Obtaining h numbers takes O(h).
These h numbers must be sorted to create compact groups and sets of single numbers.
The time to sort h numbers is O(h log (h)). After sorting, the time to create compact groups
and sets of single numbers takes O(h). Thus, the time of this phase is O(h) + O(h log(h)) +
O(h) = O(h log(h)).

Phase 2: From Algorithm 1, all compact groups at any time are in set C, and all single
numbers are in set S. The time complexity of this phase can be analyzed from Algorithm 2.
There are seven cases to be identified for inserting a new number da at step 1. The
identifying time takes O(|C|) + O(|S|) ¼ maxðOðjCjÞ; OðjSjÞÞ. Then, applying Eqs. (2), (3)
and (4) to retrieve the numbers from a compact group takes O(1). After retrieval of
the numbers, Algorithm 1 is applied to create a new compact group and a set of single
numbers with the new incoming da. This step takes at most O(h). At steps 32–34, Algorithm
1 is repeatedly applied to update sets C and S. This takes at most O(h×|C|) + O(|S|).
Since |C| ≤ h and |S| ≤ h, the time complexity of steps 32–34 is O(h2). Thus, phase 2 takes
max(O(|C|), O(|S|)) + O(1) + O(h) + O(h2) = O(h2) for each da. If there are in total n
numbers to be sorted, then the time complexity is O(h log(h)) + O((n − h) × h2) = O(nh2).
However, h is a constant. Hence, the time complexity of the sorting process is O(n).

Storage usage analysis
The behavior of storage usage is in the form of a capsized bell shape, as shown in Fig. 5.
The descriptive rationale behind this behavior was briefly provided in Fluctuation of
Compact Groups and Single Number Sets Section. This section will theoretically analyze
this behavior based on the probability of all seven cases for a compact group. Suppose
there are n total streaming numbers to be sorted. All incoming n numbers are assumed to
be randomly permuted and partitioned into nh input chunks of size h each. Let ni be the
numbers in the ith input data chunk. After obtaining the first input data chunk, the
probability of each case for the next new incoming number da for any compact group qi is
as follows.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 20/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Case 1: da is at the front of qi. The probability of case 1 is calculated by the probability of
picking da from n − n1 and the probability of having da in the next input chunk. The
probability of picking da from n − n1 numbers is

1
n�n1. However, if da is the next new

incoming number, then da must be in the next input data chunk. The probability that da is
in the next input chunk is 1n

h�1
. Thus, the probability of case 1 is as follows.

p1 ¼
1

n � n1
� 1n
h � 1

(6)

Case 2: da is at the rear of qi. The probability of case 2 can be analyzed as that of case 1.
Case 3: da is in a compact group qi, types 2, 3, and 4 are compact groups only.
If qi is a type-2 compact group, then the probability that da is in qi is

�jqij
2

�
� 1n�n1,

where |qi| represents the numbers compacted in qi.
If qi is a type-3 compact group, then the probability that da is in qi is

�jqij
2

�
� 1n�n1.

If qi is a type-4 compact group, then the probability that da is in qi is
jqij�1ð Þ
n�n1 .

p3 ¼

�jqij
2

�
� 1
n � n1

� 1n
h � 1

types 2 or 3

jqij � 1
n � n1

� 1n
h � 1

type 4:

8>><
>>: (7)

Case 4: da is in between two compact groups qi and qi + 1. The probability of case 4 can
be analyzed as that of case 1.

Case 5: da is in between a single number wj and a compact group qi.
If qi is a type-1 compact group, then the probability that da is in qi is

1
n�n1.

If qi is a type-2 compact group, then the probability that da is in qi is
1

n�n1.
If qi is a type-3 compact group, then the probability that da is in qi is

1
n�n1.

If qi is a type-4 compact group, then the probability that da is in qi is
1

n�n1.
The probability of case 5 can be analyzed as that of case 1.
Case 6: da is in between a compact group qi and a single number wk. The probability of

case 6 can be analyzed as that of case 1.
Case 7: da is in between two single numbers wj and wj + 1. The probability of case 7 can

be analyzed as that of case 1.
Note that the probability of all cases for the first input data chunk is written as follows.

p ¼

jqij
2

� �
� 1
n � n1

� 1n
h � 1

case 3 ðtype 2 or type 3Þ
jqij � 1
n � n1

� 1n
h � 1

case 3 ðtype 4Þ
1

n � n1
� 1n
h � 1

other cases

8>>>>>><
>>>>>>:

(8)

After the first input data chunk, the probability of each case after m next input data
chunks can be written in a generic form as follows.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 21/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


p ¼

jqij
2

� �
� 1
n � Pmi¼1 ni �

1
n
h � m

case 3 ðtype 2 or type 3Þ
jqij � 1

n � Pmi¼1 ni �
1

n
h � m

case 3 ðtype 4Þ
1

n � Pmi¼1 ni �
1

n
h � m

other cases

8>>>>>><
>>>>>>:

(9)

Note that the value n � Pmi¼1 ni and nh � m will finally approach 1. This implies that the
number of compact groups decreases and eventually there should be only one compact
group. However, the time during which the probability approaches 1 depends upon the
value of h, as shown in Fig. 8. If h is large, then the chance that an input chunk contains a
tentative sorted sequence is also high.

Theorem 2 The only possible existing case to be tested for the last incoming data chunk is
case 1, case 2, case 4, case 5, case 6, or case 7.

Proof: The only probability approaching 1 is the probability of case 1, case 2, case 4, case 5,
case 6, and case 7 as defined in Eqs. (9).▪

CONCLUSION
This study proposed a concrete concept and practical algorithm to sort streaming numbers
in the case where the total numbers overflow the actual storage. No secondary storage is
involved in this constraint. The size of the working storage, h, for sorting is fixed

Figure 8 Probability of cases 1, 2, 4, 5, 6 and 7 of 100 data where h = 4.
Full-size DOI: 10.7717/peerj-cs.355/fig-8

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 22/27

http://dx.doi.org/10.7717/peerj-cs.355/fig-8
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


throughout the sorting event. The incoming numbers are captured by new proposed data
architectures in the forms of sets of single numbers and compact groups of sorted
numbers. The actual order of each number with respect to the total numbers, n, in the
streaming sequence can be correctly retrieved within O(h). The time complexity of the
proposed algorithm is O(n), and the space complexity is O(M). From the experiments, it
was found that the proposed algorithm can correctly and stably handle the streaming data
size of at least 2.857 times larger than the size of the working storage. Furthermore, the
sorted order obtained from the proposed algorithm is absolutely correct, no approximate
order. In addition, each number can be directly retrieved from any compact group by
its type. The analysis of dynamic change of used and unused working storage areas during
the sorting process was also provided.

Although the proposed algorithm is primarily designed for a single processor, the
proposed algorithm can be practically extended to be implemented on a multiprocessor
architecture with a slight modification. In the case of a multiprocessor architecture, more
than one chunk of data can simultaneously flow into the machine by one chunk per
processor. The proposed algorithm can be deployed by each processor to sort each
incoming chunk and to merge the final sorted results from all processors later. In fact,
there are several real applications requiring this kind of sorting process where the data
always overflow the working memory. Some applications are the followings:

1. Managing tremendous information inside large organizations by sorting transactions
according to account numbers, locations of customers, date stamp, price or popularity of
stock, ZIP code or address of mail, and so on (Sedgewick & Wayne, 2011). The proposed
algorithm can reduce memory storage for keeping those data.

2. Reducing the search time of huge streaming data by sorting the data first and
representing them in compact groups as implemented in streaming data sort algorithm.

3. Computing order statistics, quartile, decile, and percentile of big streaming data
continuously flowing into an internet-scale network monitoring system and database
query optimization (Buragohain & Suri, 2009).

4. Checking duplicated data for fraud detection or fake social engagement activities such as
bidding on an item, filling out a form, clicking an advertisement, or making a purchase
(Metwally, Agrawal & El Abbadi, 2005; Li et al., 2016).

Even though the proposed streaming data sort successfully sorts the streaming data
under the defined constraints but some of the following further studies of streaming data
sorting based on other constraints can be pursued.

1. Developing a new structure of compact group whose type can be adapted to any
arbitrary different value of two temporal consecutive numbers.

2. Extending the sorting concept to cope with various data types such as a character string
or a floating point number which exist in other engineering, scientific, and business
problems.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 23/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


ADDITIONAL INFORMATION AND DECLARATIONS

Funding
This work was supported by Development and Promotion of Science and Technology
Talents Project (DPST) and Thailand Research Fund under grant number RTA6080013.
The funders had no role in study design, data collection and analysis, decision to publish,
or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors:
Development and Promotion of Science and Technology Talents Project (DPST).
Thailand Research Fund: RTA6080013.

Competing Interests
The authors declare that they have no competing interests.

Author Contributions
� Suluk Chaikhan conceived and designed the experiments, performed the experiments,
analyzed the data, performed the computation work, prepared figures and/or tables,
authored or reviewed drafts of the paper, and approved the final draft.

� Suphakant Phimoltares conceived and designed the experiments, analyzed the data,
prepared figures and/or tables, authored or reviewed drafts of the paper, and approved
the final draft.

� Chidchanok Lursinsap conceived and designed the experiments, analyzed the data,
prepared figures and/or tables, authored or reviewed drafts of the paper, and approved
the final draft.

Data Availability
The following information was supplied regarding data availability:

Data, including permutations of random integers, are available at the UCI repository
(https://archive.ics.uci.edu/ml/datasets.php). Specifically:

- Beijing PM2.5 Data Data Set: https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data
- Census Income Data Set: http://archive.ics.uci.edu/ml/datasets/Census+Income
- Covertype Data Set: https://archive.ics.uci.edu/ml/datasets/covertype
- Diabetes Data Set: https://archive.ics.uci.edu/ml//datasets/Diabetes
- PM2.5 Data of Five Chinese Cities Data Set: https://archive.ics.uci.edu/ml/datasets/

PM2.5+Data+of+Five+Chinese+Cities
- Incident management process enriched event log Data Set:
https://archive.ics.uci.edu/ml/datasets/Incident+management+process+enriched+event+log
- KEGG Metabolic Relation Network (Directed) Data Set: https://archive.ics.uci.edu/ml/

datasets/KEGG+Metabolic+Relation+Network+(Directed)

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 24/27

https://archive.ics.uci.edu/ml/datasets.php
https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data
http://archive.ics.uci.edu/ml/datasets/Census+Income
https://archive.ics.uci.edu/ml/datasets/covertype
https://archive.ics.uci.edu/ml//datasets/Diabetes
https://archive.ics.uci.edu/ml/datasets/PM2.5+Data+of+Five+Chinese+Cities
https://archive.ics.uci.edu/ml/datasets/PM2.5+Data+of+Five+Chinese+Cities
https://archive.ics.uci.edu/ml/datasets/Incident+management+process+enriched+event+log
https://archive.ics.uci.edu/ml/datasets/KEGG+Metabolic+Relation+Network+(Directed)
https://archive.ics.uci.edu/ml/datasets/KEGG+Metabolic+Relation+Network+(Directed)
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


- KEGG Metabolic Reaction Network (Undirected) Data Set: https://archive.ics.uci.edu/
ml/datasets/KEGG+Metabolic+Reaction+Network+(Undirected)

- Buzz in social media Data Set: https://archive.ics.uci.edu/ml/datasets/Buzz+in+social
+media+.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.355#supplemental-information.

REFERENCES
Agrawal A, Sriram B. 2015. Concom sorting algorithm. In: 015 4th International Conference on

Computer Science and Network Technology. Vol. 1. 229–233.

Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M. 2015. Internet of things:
a survey on enabling technologies, protocols, and applications. IEEE Communications Surveys &
Tutorials 17(4):2347–2376 DOI 10.1109/COMST.2015.2444095.

Babcock B, Babu S, Datar M, Motwani R, Widom J. 2002. Models and issues in data stream
systems. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on
Principles of Database Systems, PODS ’02. New York, 1–16.

Bey Ahmed Khernache M, Laga A, Boukhobza J. 2018. Montres-nvm: an external sorting
algorithm for hybrid memory. In: 2018 IEEE 7th Non-Volatile Memory Systems and Applications
Symposium (NVMSA). Piscataway: IEEE, 49–54.

Buragohain C, Suri S. 2009. Quantiles on streams. In: Encyclopedia of Database Systems.
New York: Springer-Verlag US, 2235–2240.

Cardenas A, Manadhata P, Rajan S. 2013. Big data analytics for security. IEEE Security & Privacy
11(6):74–76 DOI 10.1109/MSP.2013.138.

Cormen TH, Leiserson CE, Rivest RL, Stein C. 2009. Introduction to algorithms. Third Edition.
Cambridge: The MIT Press.

Dave M, Gianey H. 2016. Different clustering algorithms for big data analytics: a review. In: 2016
International Conference System Modeling & Advancement in Research Trends. 328–333.

Dua D, Graff C. 2019. UCI machine learning repository: school of information and computer
science. Irvine: University of California.

Elder M, Goh YK. 2018. Permutations sorted by a finite and an infinite stack in series.
In: International Conference on Language and Automata Theory and Applications. 220–231.

Farnoud F, Yaakobi E, Bruck J. 2016. Approximate sorting of data streams with limited storage.
Journal of Combination Optimization 32:1133–1164.

Faro S, Marino FP, Scafiti S. 2020. Fast-insertion-sort: a new family of efficient variants of the
insertion-sort algorithm. In: SOFSEM (Doctoral Student Research Forum). 37–48.

Franceschini G. 2004. Proximity mergesort: optimal in-place sorting in the cache-oblivious model.
In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA.
Vol. 15. New York: ACM, 291–299.

Gantz J, Reinsel D. 2012. The digital universe in 2020: big data, bigger digital shadows, and biggest
growth in the far east. IDC iView: IDC Analyze the future 2007(2012):1–16.

Goel S, Kumar R. 2018. Brownian motus and clustered binary insertion sort methods: an efficient
progress over traditional methods. Future Generation Computer Systems 86(4):266–280
DOI 10.1016/j.future.2018.04.038.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 25/27

https://archive.ics.uci.edu/ml/datasets/KEGG+Metabolic+Reaction+Network+(Undirected)
https://archive.ics.uci.edu/ml/datasets/KEGG+Metabolic+Reaction+Network+(Undirected)
https://archive.ics.uci.edu/ml/datasets/Buzz+in+social+media+
https://archive.ics.uci.edu/ml/datasets/Buzz+in+social+media+
http://dx.doi.org/10.7717/peerj-cs.355#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.355#supplemental-information
http://dx.doi.org/10.1109/COMST.2015.2444095
http://dx.doi.org/10.1109/MSP.2013.138
http://dx.doi.org/10.1016/j.future.2018.04.038
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Huang X, Liu Z, Li J. 2019. Array sort: an adaptive sorting algorithm on multi-thread.
Journal of Engineering 2019(5):3455–3459 DOI 10.1049/joe.2018.5154.

Idrizi F, Rustemi A, Dalipi F. 2017. A new modified sorting algorithm: a comparison with state of
the art. In: 2017 6th Mediterranean Conference on Embedded Computing. 1–6.

Kanza Y, Yaari H. 2016. External sorting on flash storage: reducing cell wearing and increasing
efficiency by avoiding intermediate writes. VLDB Journal 25(4):495–518
DOI 10.1007/s00778-016-0426-5.

Katal A, Wazid M, Goudar R. 2013. Big data: Issues, challenges, tools and good practices. In: 2013
Sixth International Conference on Contemporary Computing. 404–409.

Kehoe B, Patil S, Abbeel P, Goldberg K. 2015. A survey of research on cloud robotics and
automation. IEEE Transactions on Automation Science and Engineering 12(2):398–409.

Keim D, Qu H, Ma K-L. 2013. Big-data visualization. IEEE Computer Graphics and Applications
33(4):20–21.

Laga A, Boukhobza J, Singhoff F, Koskas M. 2017. Montres: merge on-the-run external sorting
algorithm for large data volumes on ssd based storage systems. IEEE Transactions on Computers
66(10):1689–1702.

Lee Y-S, Quero LC, Kim S-H, Kim J-S, Maeng S. 2016. Activesort: efficient external sorting using
active ssds in the mapreduce framework. Future Generation Computer Systems 65:76–89.

Li Y, Martinez O, Chen X, Li Y, Hopcroft JE. 2016. In a world that counts: clustering and detecting
fake social engagement at scale. Republic and Canton of Geneva: International World Wide Web
Conferences Steering Committee.

Liang Y, Chen T, Chang Y, Chen S, Wei H, Shih W. 2020. B*-sort: enabling write-once sorting for
non-volatile memory. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems. Piscataway: IEEE.

Mehmood A, Natgunanathan I, Xiang Y, Hua G, Guo S. 2016. Protection of big data privacy.
IEEE Access 4:1821–1834.

Metwally A, Agrawal D, El Abbadi A. 2005. Duplicate detection in click streams. In: Proceedings of
the 14th International Conference on World Wide Web. 12–21.

Mohammed AS, Amrahov a E, Çelebi FV. 2017. Bidirectional conditional insertion sort
algorithm; an efficient progress on the classical insertion sort. Future Generation Computer
Systems 71:102–112.

Osama H, Omar Y, Badr A. 2016. Mapping sorting algorithm. In: 2016 SAI Computing Conference
(SAI). 488–491.

O’Malley O. 2008. Terabyte sort on apache hadoop. Available at http://sortbenchmark. org/Yahoo-
Hadoop.

Sedgewick R, Wayne K. 2011. Algorithms. Boston: Addison-Wesley Professional.

Singh H, Sarmah M. 2015. Comparing rapid sort with some existing sorting algorithms. In: Das K,
Deep K, Pant M, Bansal J, Nagar A, eds. Proceedings of Fourth International Conference on Soft
Computing for Problem Solving: Advances in Intelligent Systems and Computing. Vol. 335.
New Delhi: Springer.

Tambouratzis T. 1999. A novel artificial neural network for sorting. IEEE Transactions on Systems,
Man, and Cybernetics: Part B 29(2):271–275.

Thusoo A, Shao Z, Anthony S, Borthakur D, Jain N, Sarma J, Murthy R, Liu H. 2010. Data
warehousing and analytics infrastructure at facebook. In: Proceedings of the ACM SIGMOD
International Conference on Management of Data, SIGMOD 2010. 1013–1020.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 26/27

http://dx.doi.org/10.1049/joe.2018.5154
http://dx.doi.org/10.1007/s00778-016-0426-5
http://sortbenchmark.org/Yahoo-Hadoop
http://sortbenchmark.org/Yahoo-Hadoop
http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/


Vatrapu R, Mukkamala RR, Hussain A, Flesch B. 2016. Social set analysis: a set theoretical
approach to big data analytics. IEEE Access 4:2542–2571.

Vignesh R, Pradhan T. 2016. Merge sort enhanced in place sorting algorithm. In: 2016
International Conference on Advanced Communication Control and Computing Technologies
(ICACCCT). 698–704.

Witayangkurn A, Horanont T, Shibasaki R. 2012. Performance comparisons of spatial data
processing techniques for a large scale mobile phone dataset.

YiLiang S, Zhenghong Y. 2016. Communication rules learning strategy in big data network based
on SVN neural network. In: 2016 International Conference on Intelligent Transportation, Big
Data & Smart City. 309–313.

Zhai C, Zhang Y, Hu P. 2018. Modeling and analysis of material supply network based on big data
packed with traffic. In: 2018 IEEE 3rd International Conference on Cloud Computing and Big
Data Analysis. Piscataway: IEEE, 490–494.

Zhao C, Chang H, Liu Q. 2017. Bayesian algorithm based traffic prediction of big data services in
openflow controlled optical networks. In: 2017 IEEE 2nd International Conference on Big Data
Analysis (ICBDA). Piscataway: IEEE, 823–826.

Chaikhan et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.355 27/27

http://dx.doi.org/10.7717/peerj-cs.355
https://peerj.com/computer-science/

	Correct and stable sorting for overflow streaming data with a limited storage size and a uniprocessor
	Introductions
	Constraints
	Definitions and notations
	Concepts
	Proposed algorithm
	Experimental results and discussion
	Conclusion
	References


<<
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Warning
  /CompatibilityLevel 1.4
  /CompressObjects /Off
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /LeaveColorUnchanged
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments false
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile (None)
  /AlwaysEmbed [ true
  ]
  /NeverEmbed [ true
  ]
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages false
  /ColorImageDownsampleType /Average
  /ColorImageResolution 300
  /ColorImageDepth 8
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /FlateEncode
  /AutoFilterColorImages false
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages false
  /GrayImageDownsampleType /Average
  /GrayImageResolution 300
  /GrayImageDepth 8
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /FlateEncode
  /AutoFilterGrayImages false
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages false
  /MonoImageDownsampleType /Average
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  >>
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  ]
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXOutputIntentProfile (None)
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False

  /CreateJDFFile false
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000500044004600206587686353ef901a8fc7684c976262535370673a548c002000700072006f006f00660065007200208fdb884c9ad88d2891cf62535370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef653ef5728684c9762537088686a5f548c002000700072006f006f00660065007200204e0a73725f979ad854c18cea7684521753706548679c300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002000740069006c0020006b00760061006c00690074006500740073007500640073006b007200690076006e0069006e006700200065006c006c006500720020006b006f007200720065006b007400750072006c00e60073006e0069006e0067002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f00630068007700650072007400690067006500200044007200750063006b006500200061007500660020004400650073006b0074006f0070002d0044007200750063006b00650072006e00200075006e0064002000500072006f006f0066002d00470065007200e400740065006e002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000640065002000410064006f0062006500200050004400460020007000610072006100200063006f006e00730065006700750069007200200069006d0070007200650073006900f3006e002000640065002000630061006c006900640061006400200065006e00200069006d0070007200650073006f0072006100730020006400650020006500730063007200690074006f00720069006f00200079002000680065007200720061006d00690065006e00740061007300200064006500200063006f00720072006500630063006900f3006e002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f007500720020006400650073002000e90070007200650075007600650073002000650074002000640065007300200069006d007000720065007300730069006f006e00730020006400650020006800610075007400650020007100750061006c0069007400e90020007300750072002000640065007300200069006d007000720069006d0061006e0074006500730020006400650020006200750072006500610075002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f006200650020005000440046002000700065007200200075006e00610020007300740061006d007000610020006400690020007100750061006c0069007400e00020007300750020007300740061006d00700061006e0074006900200065002000700072006f006f0066006500720020006400650073006b0074006f0070002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea51fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e30593002537052376642306e753b8cea3092670059279650306b4fdd306430533068304c3067304d307e3059300230c730b930af30c830c330d730d730ea30f330bf3067306e53705237307e305f306f30d730eb30fc30d57528306b9069305730663044307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e30593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020b370c2a4d06cd0d10020d504b9b0d1300020bc0f0020ad50c815ae30c5d0c11c0020ace0d488c9c8b85c0020c778c1c4d560002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken voor kwaliteitsafdrukken op desktopprinters en proofers. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200066006f00720020007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c00690074006500740020007000e500200062006f007200640073006b0072006900760065007200200065006c006c00650072002000700072006f006f006600650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020007000610072006100200069006d0070007200650073007300f5006500730020006400650020007100750061006c0069006400610064006500200065006d00200069006d00700072006500730073006f0072006100730020006400650073006b0074006f00700020006500200064006900730070006f00730069007400690076006f0073002000640065002000700072006f00760061002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a00610020006c0061006100640075006b006100730074006100200074007900f6007000f60079007400e400740075006c006f0073007400750073007400610020006a00610020007600650064006f007300740075007300740061002000760061007200740065006e002e00200020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740020006600f600720020006b00760061006c00690074006500740073007500740073006b0072006900660074006500720020007000e5002000760061006e006c00690067006100200073006b0072006900760061007200650020006f006300680020006600f600720020006b006f007200720065006b007400750072002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents for quality printing on desktop printers and proofers.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  >>
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  ]
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /NoConversion
      /DestinationProfileName ()
      /DestinationProfileSelector /NA
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure true
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles true
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /NA
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /LeaveUntagged
      /UseDocumentBleed false
    >>
  ]
>> setdistillerparams
<<
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice