Cutevariant: a GUI-based desktop application to explore genetics variations


Journal Title Here, 2021, 1–8

doi: DOI HERE

Advance Access Publication Date: Day Month Year

Paper

Cutevariant: a GUI-based desktop application to
explore genetics variations

Sacha Schutz,1,2 Tristan Montier1 and Emmanuelle Genin2

1Univ Brest, CHRU Brest, Inserm, EFS, UMR 1078, GGB, 29200, Brest, France and 2Inserm, Univ Brest,

EFS, UMR 1078, GGB, 29200, Brest, France

∗Corresponding author. sacha@labsquare.org

FOR PUBLISHER ONLY Received on Date Month Year; revised on Date Month Year; accepted on Date Month Year

Abstract

Cutevariant is a user-friendly GUI based desktop application for genomic research designed to search for

variations in DNA samples collected in annotated files and encoded in the Variant Calling Format. The

application imports data into a local relational database wherefrom complex filter-queries can be built either

from the intuitive GUI or using a Domain Specific Language (DSL). Cutevariant provides more features than

any existing applications without compromising on performance. The plugin based architecture provides

highly customizable features. Cutevariant is distributed as a multiplatform client-side software under an open

source licence and is available at https://github.com/labsquare/Cutevariant. It has been designed from the

beginning to be easily adopted by IT-agnostic end-users.

Key words: genomics, DNA variant, desktop application, Domain Specific Language, Graphic User Interface

Introduction

Next-Generation Sequencing (NGS) has opened new opportunities

in genomic research such as identification of DNA variations

from Genome, Exome or Panel experiments. These data are

delivered as files encoded in the standard Variant Calling

Format (VCF version 4.0) [1] where the variations are listed

together with the genotype information of different samples.

Tools such as VEP [2] or SnpSift [3] can be use to add

annotations such as genes or functional impact. Biologists can

then filter out variants applying customized criteria on these

annotations. In medicine, the identification of mutations in rare

diseases would be a typical use case. This filtering procedure

implements sophisticated software tools that can be easily

adopted by end-users who are not necessarily IT-aware.

Several management systems have been developed to ease

the usage of the filtering step. GEMINI [4] and VariantTools [5]

are command line applications where data from the VCF files

are loaded into a relational database managed by SQLite [6].

Filtering can thus be made very efficient using the SQL query

syntax. Other tools such as SnpSift [3] or BCFtools [7] apply

filters directly while reading the VCF files line by line, thus

avoiding the need to create an intermediate data structure. This

comes at the cost of poor timing efficiency especially when it is

necessary to sort or group variants. While these tools are quite

flexible allowing any kind of filtering, the command line interface

is not very intuitive, thus reducing the incentive to use it for non

IT-specialists.

This called for the development of applications steered by

user-friendly Graphical User Interfaces (GUI). Some specializing

in diagnostics offer online solutions with a complete set of

patient management features but require uploading the VCF

files. The most popular of the kind are either private software

such as SeqOne [8] and or those distributed under the open

source licence such as the recently published VarFish [9]. A

major drawbacks of this scheme comes from the transit of a

large amount of genetic data through public networks raising on

one hand confidentiality and performance issues, and requiring

on the other hand a dedicated server which might not be

available for every end-users. Moreover, these solutions are

tailored for human species data and therefore cannot be adopted

for all end-users. GUI Applications that do not require a

server and offering an out-of-the-box solution are therefore a

preferable solution. The web-based applications VCFMiner [10],

BrowseVCF [11] and VCF.Filter [12] implement such a solution.

VCFMiner is distributed as a package container running with

Docker [13] requiring thus a customized desktop configuration.

BrowseVCF provides its own launcher making it quite user

friendly but the application is not supported anymore. Both

applications import the data from VCF files into an indexed

database and provide different GUI forms to create filters. Their

main drawback resides in the limited filter settings available

1

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

email:email-id.com
https://github.com/labsquare/Cutevariant
https://doi.org/10.1101/2021.02.10.430619


2 Short Article Title

Fig. 1: Cutevariant database schema. Only mandatory fields are displayed. fields n are dynamically created during the import step

based on the content of the VCF file

through the GUI, complex filters requiring a domain specific

language. In addition, web applications offer poor timing

performances compared to native desktop applications. Despite

the availability of these tools, many biologists still use Microsoft

Excel to filter their variants and are facing severe problems [14].

To address the shortcomings of the existing applications, we

have developed Cutevariant, a user-friendly and ergonomic

desktop application implemented in Python within the Qt5

framework. It takes full advantage of both a GUI and command

line user-interface, a Domain Specific Language called VQL

allowing the user to build complex filter expressions. It is

distributed as a multi-platform client-side software under an

open source licence. Thanks to an architecture based on plugins,

Cutevariant is fully customizable, allowing to easily extend the

application with additional features.

Materials and methods

VCF file importation and preprocessing

Cutevariant imports data from VCF files into a normalized

SQLite database (Figure 1) stored as a *.db file, and optionally

with a PED file to describe affected samples and their

relationship. Fields from variants and annotations tables are

dynamically created according to the content of the VCF

file. This importation step proceeds using a VCF parser to

produce json-like arrays tailored for populating the SQLite

database. It is based on a strategy design pattern so that any

formats can be supported by subclassing an abstract Reader

object. The available distribution supports raw VCF files and

VCF files annotated with VEP or SnpEff following the ANN

specifications [15].

Before importation into the database, data are cleaned and

normalized following the same procedure as the VT norm [16]

application: single lines of multi-allelic variants are split into

multiple lines. Computed annotations, not present in the

original file, are automatically created. As for example, the

count var field contains the number of samples that carry the

variant. It is thus possible to filter variants present in more than

N samples by filtering on this column. This feature is similar to

countVar() from the SnpSift [3] filter command.

From the Cutevariant main window, the new project button

starts a wizard and triggers the importation process. Depending

on the size of the input, the importation and indexation

process might take some time but this has only minimal impact

on the performance since this step is performed only once.

Alternatively, VCF files import can be triggered from the

command line using the Cutevariant-cli button. This feature

offers to knowledgeable experts the possibility to integrate the

import process at the end of a pipeline.

User interface layout

The main view (Figure 2) of the Cutevariant GUI displays the

list of variants together with their annotations. Several GUI

controllers allow the user to update the view and display the

list in different formats.

• fields editor: to show or hide selected annotations.
• filter editor: to build a nested list of conditional rules with

OR/AND binary operators.

• variant info: to display in an organised way all annotations
related to the currently selected variant.

• source editor: to manage different views and perform set
operations (union, intersection, difference) and bed file

intersections.

• word set: to manage lists of words used to generate simple
filters, e.g., filter all variants belonging to a given gene list

or a dbSNP list.

Most of these actions end up building a VQL query that can be

checked in the VQL-editor sub-window. The variants list can

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430619


Short Article Title 3

Fig. 2: The Cutevariant main view showing the variants list sub-window (middle), different controllers sub-windows but not all are

displayed (left) and the VQL editor sub-window (bottom).

then be updated either with the controllers or by editing the

VQL query directly.

Variant Query Language (VQL)

To facilitate the composition of complex query-filters, the

application integrates a Domain Specific Language (DSL)

named Variant Query Language (VQL). The syntax of VQL

has been designed to look like a subset of the SQL language

working on a virtual database schema. It makes use of the

Python module textX [17] which provides several tools to define

a grammar and create parsers with an Abstract Syntax Tree.

VQL queries can be composed in the VQL editor sub-window.

However, to avoid forcing users to learn the VQL language, a

query can as well be defined from the GUI using the different

available controller sub-window listed above. The VQL query

is translated through the intermediary of a JSON object into a

well formatted SQL query and processed by the SQLite database

manager.

As an example, the following VQL query:

SELECT chr,pos,consequence,sample['NA1223'].gt

FROM variants

WHERE gene = 'CFTR' AND impact = 'HIGH'

is translated into the following SQL query :

SELECT DISTINCT

`variants`.`id`,

`variants`.`chr`,

`variants`.`pos`,

`annotations`.`consequence`,

`sample_NA1223`.`gt` AS "sample('NA1223').gt"

FROM variants

LEFT JOIN annotations

ON annotations.variant_id = variants.id

INNER JOIN sample_has_variant `sample_NA1223`

ON `sample_NA1223`.variant_id = variants.id

AND `sample_NA1223`.sample_id = 1

WHERE (

`annotations`.`gene` = 'CFTR'

AND `annotations`.`impact` = 'HIGH')

LIMIT 50 OFFSET 0

Filter expressions

Filter expressions are defined from the VQL WHERE clause. From

the filter editor, it is displayed as a nested set of editable

condition rules. Logical (AND/OR) and arithmetic (=, <, >,

≤, ≥, 6=, IN, NOT IN, IS NULL) operators are supported.
Regular expression using the binary ones complement operator

(∼) and a special WORDSET keyword are included as well.
This keyword allows the user to test if a fields belongs to a set

of words defined a priori. For instance, in VQL, to select all

variants from a list of a user-defined genes:

CREATE SET genes ('gene.txt')

SELECT * FROM variants WHERE gene IN WORDSET['genes']

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430619


4 Short Article Title

Fig. 3: Abstract Syntax Tree (AST) of the VQL query

SELECT chr,pos,consequence FROM variants WHERE gene='CFTR'

AND impact='HIGH'. The AST is parsed into a Python object.

Group variants

The GROUP BY keyword allows the user to split the view in

two panels: left the list of groups and right the list of all

variants belonging to the selected group. With this feature the

exploration is made easier by, for instance, grouping variants by

genes helping to detect compound heterozygous.

Set operation

Just like Variant Tools, Cutevariant supports operations

between variant sets. Each query result can be stored in a view

using the CREATE VQL keywords or by clicking the corresponding

GUI button. For instance, the following query will create a new

view called new view.

CREATE new_view FROM variants WHERE gene='CFTR'

It is then possible to build a query directly from this view. The

following query returns the same output as the previous one:

SELECT chr, pos FROM new_view

Each view behaves as a set with three operations available

(difference, intersection, union) by comparing variants fields on

chr, pos, ref and alt. The following queries show how to create

a new view based on different set operation:

# difference

CREATE second_view = variants - new_view

# union

CREATE second_view = variant + new_view

# intersection

CREATE second_view = variant & new_view

Plugins architectures

The Cutevariant GUI architecture relies entirely on plugins

which source is available in the plugins directory. A

plugin consists of a module containing different Python files

implementing the creation of a Plugin class instance with

several overloaded virtual methods. Adding or removing GUI

controllers becomes therefore straightforward.

In addition, similarly to excel, cells of the variant view

can be formatted conditionally. By subclassing the Formatter

class, one can change the style of the cell with different

colors, text or icons according to the value of the cell. For

instance, impact fields with HIGH as value can be displayed

with a red background to catch the user’s attention. Currently,

Cutevariant supports only one formatters: cuteStyle.

Cutevariant allows the user to build a custom URL from a

variant and open it from an external application. This is used

for example to open a web link on a dbSNP database or to

show BAM alignment from IGV software at the corresponding

variant location.

With plugins, experienced users can customize Cutevariant

with dedicated features or create new ones and share them with

the users community.

Technical details and continuous integration

Cutevariant is a cross platform application implemented in

Python 3.7 using the Qt5 framework for the user interface

(PySide2 ≥ 5.11). The VCF parser uses the PyVCF ≥ 0.6.8
library. Syntax and parser of the VQL language rely on

the textX ≥ 1.8.0 library. SQLite3 is the database manager
interfaced with the Python standard library. The source code

and documentation are available on GitHub [18]. Continuous

integration are made on GitHub-CI and unit tests are made

with the Pytest framework [19]. The application is distributed

as windows 32 bits and 64 bits packages. Cutevariant is also

available as a Python package from the Python Package Index

Pypi [20].

Results

In Table 1 we list the features available in Cutevariant

compared to other applications available on the market.

The timing performance of Cutevariant to execute different

actions is reported in Table 2 and compared to the timing

performance of VCF-Miner, the fastest application we have

evaluated. Cutevariant outperforms VCF-Miner except for

1KG.chr22.anno.vcf. The reason comes from the large number

of samples required to compute the joint table between samples

and variants.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430619


Short Article Title 5

Table 1. Features available in various applications available on the market.

GUI Command Line

Features Cutevariant BrowseVCF VCF-Miner VCF-Explorer VCF-Server VCF-Filters GEMINI Variant Tools SnpSift

process annotations no no no no yes no yes no no

VEP parser yes yes no no no no yes no no

SnpEff parser yes yes no no no no yes yes yes

SQL like query yes no no no no no yes yes yes

regular expressions yes no no no no no no∗ no∗ yes

bed file intersection yes no yes no no yes no no yes

set operations yes no no no no no no yes yes

sorting yes yes yes no yes no yes yes yes

intersect with wordset yes yes no no no no no no yes

plugins extension yes no no no no no no no no

indexed database SQLite Berkeley DB MongoDB raw file MongoDB raw file SQLite SQLite raw file

data encryption no∗∗ no no no yes no no no no

language Py3/Qt Py2/HTML JS/HTML C++/Qt Node.js Java Py3 Py3 Java

pedigree file yes no no no no no yes yes yes

application type desktop web web web web desktop console console console

multi-users support no no no no yes no no no

CVS/Excel export yes yes yes yes yes no yes yes yes

∗Support LIKE SQL expression

∗∗Possible with SQLITE encryption extension

Table 2. Comparaison of time performance between cutevariant and VCF-miner for importation and query execution. The query used filters the

variants with QUAL ≥30 and DEPTH ≥ 30. Executed on Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz with 16Gb RAM

input file 1KG.chr22.anno.vcf corpas.quartlet.vcf NA12878.vcf

variant count 494’328 300’035 3’775’119

sample count 1092 4 1

software cutevariant VCF-miner cutevariant VCF-miner cutevariant VCF-miner

importation time 6600s 2940s 78s 183s 810s 2220s

query execution time* ≈ 1s ≈ 1s 0.02s ≈ 1s 0.02s ≈ 1s

Use case 1: Sars-CoV-2-Analysis

In the context of the Covid-19 pandemia, we have tested

Cutevariant to identify mutations along the genome of the Sars-

Cov-2 virus. For this, we have downloaded from the ENA

database, a dataset (PRJNA673096) with 245 samples stored

in a Fastq file produced by the Illumina sequencing plateform

using an amplicon librarie. The pipeline is available on github

[21].The data originate from the US Delaware Public Health

Laboratory. Fastq files have been aligned on the NC045512.2

genome of Sars-CoV-2 with the BWA software [22]. Variants

have been called with the FreeBayes application [23] and

all 245 samples have been merged into one single VCF file

annotated with SnpEff[24]. This file has been imported into

Cutevariant for exploration. We executed a VQL statements

(Fig. 4) to extract variants within the gene S and sorted the

result by count var annotation showing the total number of

samples carrying the variant. The sorting process is easily

done by clicking on the corresponding header of the view. The

mutation p.asp614Gly (highlighted in Fig. 4) is found in 239

samples out of 245. This variant has already been described

[25] as a dominant one emerging at the beginning of the

pandemia. In the same way, by scrutinizing all the genes, we

have identified two others mutation: (ORF1ab)p.Thr265Ile and

(ORF3a)p.Gln57His which are exclusive to the North American

population [26].

Fig. 4: Mutation found in gene S of Sars-Cov-2 by a Cutevariant

analysis of 245 samples.

Use case 2: Cohort analysis

We have repeated with Cutevariant the analysis given as an

example by SnpSift [27]. It is a cohort analysis of 17 individuals

among which 3 are affected by a nonsense mutation in the

CFTR gene (G542*). This analysis cannot be performed with

any of the graphics application listed previously (Table 1). After

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430619


6 Short Article Title

importing the annotated VCF file and the corresponding PED

file, the following VQL query was processed by Cutevariant

selecting variants with HIGH impact which are homozygous in

case samples but are not in control samples. SnpSift uses the

following query:

cat protocols/ex1.ann.cc.vcf \

| java -jar SnpSift.jar filter \

(Cases[0]=3) & (Controls[0]=0)

((ANN[*].IMPACT='HIGH')|\

(ANN[*].IMPACT='MODERATE')) \

> protocols/ex1.filtered.vcf

The Cutevariant equivalent VQL query providing the same

results reads as:

SELECT chr, pos FROM variants

WHERE case_count_hom=3 control_count_hom=0

AND impact

IN ('HIGH', 'MODERATE')

Discussion

Performance

Cutevariant is implemented within the open-source Qt for

Python [28] that provides a set of Python bindings to build

modern user interface. Instead of using native Qt/C++ as

coding language, we have opted for Python because it is

by far the most frequently used coding language in the

bioinformatics community. This choice does not cause any

significant performance degradation of the Cutevariant GUI.

Execution time for queries performed on a complete genome

with many filters can become particularly slow. This long

execution time is primarily due to the SQL COUNT statement

which browses through all the variants to calculate the total

number of variants. The table JOIN statement is also time

consuming. This is the consequence of the choice made for

Curevariant, unlike GEMINI, to store samples and a few

annotations in separate tables to avoid table denormalization

and to minimize disk space occupation. This time penalty has

been minimized on one hand by using a memory cache so that

identical VQL queries do not need to recalculate the count of

variants and, on the other hand, by using asynchronous queries

performed in dedicated threads, thus avoiding to freeze the GUI

with the progress bar showing the loading status.

Web app vs Desktop app

Cutevariant is a serverless desktop application and therefore

does not provide annotation- or multiuser-features. The

annotation step must be carried out upstream at the end of

an analysis pipeline by using dedicated tools such as SnpSift

or VEP. Multi-users capabilities allow users to share custom

annotations and comments. For instance, a user marks a variant

as pathogenic and this information is shared among all users.

Although this feature is not supported by Cutevariant, it can be

delegated to other tools such as MyVariant.info [29]. It provides

a database of variants with which Cutevariant can communicate

through a REST API. These data can then be used as a source

of annotation in the annotation step of the pipeline.

A general purpose and customizable tool

Cutevariant is a general purpose tool to filter variants and is

fully customizable thanks to its plugin-based implementation

and thus offers features and modularity that are not available

with existing applications. Since Cutevariant is not specific to

the analysis of the human genome, it can be use with any VCF

file as we demonstrated here with the Sars-Cov-2 example. GUI

options dedicated to specific tasks are not hard coded in the

application but can easily be added to Cutevariant by creating

new plugins. As an example of such added GUI options, the

Trio Analysis plugin selected from the Tools menu users to build

from the GUI a VQL filter including transmission mode and the

family tree.

Conclusion

Cutevariant is a new desktop application devoted to explore

genetic variations in VCF data provided by next generation

sequencing. It is the first GUI software of the kind that

integrates both a user friendly graphical user interface and

a domain specific language. Starting from a low learning

threshold, end-users can easily perform complex filtering to

identify variants of interest. Cutevariant is a standalone

application that runs on standard desktop computers either

under Linux, MacOS or Windows operating systems. The

python-based plugins architecture makes the application easily

expandable with the addition of new features, thus offering the

possibility to involve the biocomputer scientists community at

large in new features developments.

Acknowledgments

We would like to thank Lucas Bourneuf and Pierre Vignet for

contribution to the development.

Funding

This work has been supported by UBO, Université de Bretagne

Occidentale, France.

Conflict of Interest: none declared

References

1. Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A.

Albers, Eric Banks, Mark A. DePristo, Robert E.

Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T.

Sherry, Gilean McVean, and Richard Durbin. The variant

call format and VCFtools. Bioinformatics, 27:2156–2158, 8

2011.

2. William McLaren, Laurent Gil, Sarah E. Hunt,

Harpreet Singh Riat, Graham R.S. Ritchie, Anja Thormann,

Paul Flicek, and Fiona Cunningham. The ensemble variant

effect predictor. Genome Biology, 17:1–14, 6 2016.

3. Pablo Cingolani, Adrian Platts, Le Lily Wang, Melissa

Coon, Tung Nguyen, Luan Wang, Susan J. Land, Xiangyi

Lu, and Douglas M. Ruden. A program for annotating and

predicting the effects of single nucleotide polymorphisms,

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430619


Short Article Title 7

SnpEff: SNPs in the genome of drosophila melanogaster

strain w1118; iso-2; iso-3. Fly, 6:80–92, 2012.

4. Umadevi Paila, Brad A. Chapman, Rory Kirchner, and

Aaron R. Quinlan. GEMINI: Integrative exploration

of genetic variation and genome annotations. PLoS

Computational Biology, 9, 7 2013.

5. Gao T. Wang, Bo Peng, and Suzanne M. Leal. Variant

association tools for quality control and analysis of large-

scale sequence and genotyping array data. American

Journal of Human Genetics, 94:770–783, 5 2014.

6. Richard D Hipp. SQLite.

https://www.sqlite.org/index.html, 2020.

7. Heng Li. A statistical framework for SNP calling,

mutation discovery, association mapping and population

genetical parameter estimation from sequencing data.

Bioinformatics, 27(21):2987–2993, 09 2011.

8. Anne-Sophie Lebre and Jean-Marc Rey. SeqOne.

https://seq.one/, Jan 2021.

9. Manuel Holtgrewe, Oliver Stolpe, Mikko Nieminen, Stefan

Mundlos, Alexej Knaus, Uwe Kornak, Dominik Seelow,

Lara Segebrecht, Malte Spielmann, Björn Fischer-Zirnsak,

Felix Boschann, Ute Scholl, Nadja Ehmke, and Dieter

Beule. VarFish: comprehensive DNA variant analysis

for diagnostics and research. Nucleic Acids Research,

48(W1):W162–W169, 04 2020.

10. Steven N. Hart, Patrick Duffy, Daniel J. Quest, Asif

Hossain, Mike A Meiners, and Jean-Pierre Kocher.

VCF-Miner: GUI-based application for mining variants

and annotations stored in VCF files. Briefings in

Bioinformatics, 17(2):346–351, 07 2015.

11. et al. W James Kent. The human genome browser at UCSC.

Genome Res., 12(6):996–1006, 06 2002.

12. Heiko Müller, Raul Jimenez-Heredia, Ana Krolo, Tatjana

Hirschmugl, Jasmin Dmytrus, Kaan Boztug, and Christoph

Bock. VCF.Filter: interactive prioritization of disease-

linked genetic variants from sequencing data. Nucleic Acids

Research, 45(W1):W567–W572, 05 2017.

13. Empowering app development for developers.

https://www.docker.com/.

14. Mark Ziemann, Yotam Eren, and Assam El-Osta. Gene

name errors are widespread in the scientific literature.

Genome Biology, 17, 8 2016.

15. Pablo Cingolani, Fiona Cunningham, Will Mclaren,

and Kai Wang. Variant annotations in VCF format.

http://www.ensembl.org/Help/Glossary?id=492.

16. Adrian Tan, Gonçalo R. Abecasis, and Hyun Min Kang.

Unified representation of genetic variants. Bioinformatics,

31(13):2202–2204, 02 2015.

17. I. Dejanović, R. Vaderna, G. Milosavljević, and Vuković.

TextX: A Python tool for Domain-Specific Languages

implementation. Knowledge-Based Systems, 115:1–4, 1

2017.

18. Cutevariant. https://github.com/labsquare/cutevariant.

19. Pytest. https://docs.pytest.org/en/stable.

20. Python Package Index. https://pypi.org/.

21. githubcovid.https : //github.com/dridk/Sars−CoV − 2 −
NGS − pipeline.

22. Heng Li and Richard Durbin. Fast and accurate short read

alignment with Burrows-Wheeler transform. Bioinformatics,

25:1754–1760, 7 2009.

23. Erik Garrison and Gabor Marth. Haplotype-

based variant detection from short-read sequencing.

http://arxiv.org/abs/1207.3907, 7 2012.

24. P. Cingolani, A. Platts, M. Coon, T. Nguyen, L. Wang, S.J.

Land, X. Lu, and D.M. Ruden. A program for annotating and

predicting the effects of single nucleotide polymorphisms, snpeff:

Snps in the genome of drosophila melanogaster strain w1118;

iso-2; iso-3. Fly, 6(2):80–92, 2012.

25. Bette Korber, Will M. Fischer, Sandrasegaram Gnanakaran,

Hyejin Yoon, James Theiler, Werner Abfalterer, Nick

Hengartner, Elena E. Giorgi, Tanmoy Bhattacharya, Brian

Foley, Kathryn M. Hastie, Matthew D. Parker, David G.

Partridge, Cariad M. Evans, Timothy M. Freeman, Thushan I.

de Silva, Adrienne Angyal, Rebecca L. Brown, Laura

Carrilero, Luke R. Green, Danielle C. Groves, Katie J.

Johnson, Alexander J. Keeley, Benjamin B. Lindsey, Paul J.

Parsons, Mohammad Raza, Sarah Rowland-Jones, Nikki Smith,

Rachel M. Tucker, Dennis Wang, Matthew D. Wyles, Charlene

McDanal, Lautaro G. Perez, Haili Tang, Alex Moon-Walker,

Sean P. Whelan, Celia C. LaBranche, Erica O. Saphire, and

David C. Montefiori. Tracking changes in sars-cov-2 spike:

Evidence that d614g increases infectivity of the covid-19 virus.

Cell, 182:812–827.e19, 8 2020.

26. Xumin Ou, Zhishuang Yang, Dekang Zhu, Sai Mao, Mingshu

Wang, Renyong Jia, Shun Chen, Mafeng Liu, Qiao Yang,

Ying Wu, Xinxin Zhao, Shaqiu Zhang, Juan huang, Qun Gao,

Yunya Liu, Ling Zhang, Maikel Peopplenbosch, Qiuwei Pan,

and Anchun Cheng. Tracing two causative snps reveals sars-

cov-2 transmission in north america population. bioRxiv, page

2020.05.12.092056, 5 2020.

27. Snpeff usage example. https://pcingola.github.io/SnpEff/examples/.

28. The Qt Company. Qt for Python: The official Python bindings

for Qt. https://www.qt.io/qt-for-python.

29. Variant annotation as a service. https://myvariant.info/.

30. Silvia Salatino and Varun Ramraj. BrowseVCF: a web-based

application and workflow to quickly prioritize disease-causative

variants in VCF files. Briefings in bioinformatics, 18:774–779,

9 2017.

31. Steven N. Hart, Patrick Duffy, Daniel J. Quest, Asif Hossain,

Mike A. Meiners, and Jean Pierre Kocher. VCF-Miner: GUI-

based application for mining variants and annotations stored in

VCF files. Briefings in Bioinformatics, 17:346–351, 3 2016.

32. Jianping Jiang, Jianlei Gu, Tingting Zhao, and Hui Lu. VCF-

Server: A web-based visualization tool for high-throughput

variant data mining and management. Molecular Genetics and

Genomic Medicine, 7, 7 2019.

33. F. Anthony San lucas, Gao Wang, Paul Scheet, and

Bo Peng. Integrated annotation and analysis of genetic variants

from next-generation sequencing studies with variant tools.

Bioinformatics, 28:421–422, 2 2012.

34. The Qt Company. Cross-platform software development for

embedded and desktop. https://www.qt.io/.

35. Manuel Holtgrewe, Oliver Stolpe, Mikko Nieminen, Stefan

Mundlos, Alexej Knaus, Uwe Kornak, Dominik Seelow, Lara

Segebrecht, Malte Spielmann, Björn Fischer-Zirnsak, Felix

Boschann, Ute Scholl, Nadja Ehmke, and Dieter Beule.

VarFish: comprehensive DNA variant analysis for diagnostics

and research. Nucleic acids research, 48:W162–W169, 7 2020.

36. Damian Smedley, Julius O B Jacobsen, Marten Jager, Sebastian

Köhler, Manuel Holtgrewe, Max Schubach, Enrico Siragusa,

Tomasz Zemojtel, Orion J Buske, Nicole L Washington,

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430619


8 Short Article Title

William P Bone, Melissa A Haendel, and Peter N Robinson.

Next-generation diagnostics and disease-gene discovery with the

Exomiser. Nature protocols, 10:2004, 2015.

37. DNA sequencing. https://www.integragen.com/service-

solutions/dna-sequencing, Oct 2020.

38. Adrian Tan, Gonçalo R. Abecasis, and Hyun Min Kang.

Unified representation of genetic variants. Bioinformatics,

31:2202–2204, 7 2015.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430619doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430619

	Introduction
	Materials and methods
	VCF file importation and preprocessing
	User interface layout
	Variant Query Language (VQL)
	Filter expressions
	Group variants
	Set operation

	Plugins architectures
	Technical details and continuous integration

	Results
	Use case 1: Sars-CoV-2-Analysis
	Use case 2: Cohort analysis

	Discussion
	Performance
	Web app vs Desktop app
	A general purpose and customizable tool

	Conclusion
	Acknowledgments
	Funding