Untitled Document Formally analysing the concepts of domestic violence Jonas Poelmans1, Paul Elzinga3, Stijn Viaene1,2, Guido Dedene1,4 1K.U.Leuven, Faculty of Business and Economics, Naamsestraat 69, 3000 Leuven, Belgium 2Vlerick Leuven Gent Management School, Vlamingenstraat 83, 3000 Leuven, Belgium 3Amsterdam-Amstelland Police, James Wattstraat 84, 1000 CG Amsterdam, The Netherlands 4Universiteit van Amsterdam Business School, Roetersstraat 11 1018 WB Amsterdam, The Netherlands {Jonas.Poelmans, Stijn.Viaene, Guido.Dedene}@econ.kuleuven.be Paul.Elzinga@amsterdam.politie.nl Abstract. The types of police inquiries performed these days are incredibly diverse. Often data processing architectures are not suited to cope with this diversity since most of the case data is still stored as unstructured text. In this paper Formal Concept Analysis (FCA) is showcased for its exploratory data analysis capabilities in discovering domestic violence intelligence from a dataset of unstructured police reports filed with the regional police Amsterdam-Amstelland in the Netherlands. From this data analysis it is shown that FCA can be a powerful instrument to operationally improve policing practice. For one, it is shown that the definition of domestic violence employed by the police is not always as clear as it should be, making it hard to use it effectively for classification purposes. In addition, this paper presents newly discovered knowledge for automatically classifying certain cases as either domestic or non-domestic violence is. Moreover, it provides practical advice for detecting incorrect classifications performed by police officers. A final aspect to be discussed is the problems encountered because of the sometimes unstructured way of working of police officers. The added value of this paper resides in both using FCA for exploratory data analysis, as well as with the application of FCA for the detection of domestic violence. Keywords: Formal Concept Analysis (FCA), domestic violence, knowledge discovery in databases, text mining, exploratory data analysis, knowledge enrichment, concept discovery 1 Introduction Concept discovery is a relatively new approach for discovering knowledge from textual information [10]. At the core of the method is the visualization of the underlying concepts of the data by means of Formal Concept Analysis (FCA) lattices [8, 9] which are interpreted, analysed and discussed by do- main experts. FCA arose twenty-five years ago as a mathematical theory [14] and has over the years grown into a powerful framework for data analysis, data visualization [15], information retrieval and text mining [16, 17, 20]. In this paper FCA is for the first time used as an exploratory data analysis and knowledge enrichment technique for police data. Compared to traditional black-box data mining tech- niques, this human-centred approach has the advantage of actively engaging expert knowledge in the discovery process. The goal of Intelligence Led Policing (ILP) is to complement intuition led police actions with information coming from analyses on aggregated operational data, such as crime figures and criminal characteristics [25, 26, 39]. While over 80% of all information available to police organizations CORE Metadata, citation and similar papers at core.ac.uk Provided by Research Papers in Economics https://core.ac.uk/display/6259772?utm_source=pdf&utm_medium=banner&utm_campaign=pdf-decoration-v1 2 resides in textual form, analysis has to date been primarily focused on the structured portion of the available data. Though text mining has been identified as a promising area in the formal framework for crime data mining by Chen et al. [27], this work has hardly found its way into mainstream scientific literature. One of the notorious exceptions is the paper by Ananyan [28] in which historical police reports were analysed to identify hidden patterns. According to the Ministry of Justice of the Netherlands, 45% of the population once fell victim to non-incidental domestic violence and for 27% of the population, the incidents even occurred on a weekly or daily basis [22]. These gloomy statistics brought this topic to the centre of the political agenda and made it to one of the pivotal projects of the Balkenende administration when it took office in 20031 and the Amsterdam-Amstelland police in the Netherlands [32]. Sufficient insight into the nature of domestic violence, being able to swiftly recognise suspicious cases and label reports accordingly is of the utmost importance. However, in the past intensive audits of the police databases related to filed reports established that many reports tended to be wrongly labelled as domestic or as non-domestic violence cases. In this paper we shall demonstrate the effectiveness of concept discovery methods for distilling new knowledge from the unstructured text in police reports. FCA amongst others helped us to improve the definition, the understanding by police officers and the management of the notion of domestic vi- olence. Additionally, we aim at automating detection of domestic violence from the unstructured text in police reports. The very first steps taken in this direction are described in [37] and in [38] an independent research track pursued in parallel with the work presented in this paper based on Emergent Self Organizing Maps is described. Although the usage of FCA for browsing text col- lections has been suggested before by Cole et al. [18, 35], almost none of these papers have focused on how FCA can be used for knowledge enrichment and for discovering different types of knowledge in unstructured text. Neither has it been thoroughly discussed in the literature how FCA can be used to incrementally construct and refine a high-quality domain-specific thesaurus (which is a prerequisite for developing an effective information retrieval system). Moreover, only minor attention has been paid to the possibilities offered by FCA to incorporate prior knowledge in the knowledge discovery 1 http://www.regering.nl/Het_kabinet/Eerdere_kabinetten/Kabinet_Balkenende_II/Regeerakkoord#internelink4 3 process. Finally, some of the aspects of this paper have already been discussed in the literature in a fragmented way (e.g. information retrieval, knowledge browsing), but an integrated approach has nev- er been pursued. FCA is particularly suited for exploratory data analysis because of its human-centredness. Repre- sentations that expose the underlying conceptual structure of the information promote the creation of new knowledge. What makes FCA an especially appealing technique for knowledge discovery in da- tabases from a practitioner’s point of view is the compactness of its information representation and the minimal need for users to tune (hyper-) parameters to distill a useful, actionable picture of the mining exercise. Concepts are the elementary units of human reasoning and this notion of concept is central to FCA [23, 24]. The underlying structure of the information is considered to be a concept system and FCA concept lattices are used to visualize the concepts and their interrelationships. These visual repre- sentations support human actors in their information discovery and knowledge creation exercise. This paper is composed as follows. In section 2 we describe the current situation of the domestic violence reporting procedure and previous attempts to improve the situation. In section 3 we cover the essentials of FCA theory, introducing the pivotal FCA notions of concept and concept lattice and describing the process of FCA for knowledge discovery. Section 4 elaborates on the dataset used in our research, while section 5 focuses on how this dataset was analysed and discusses the results of the application of FCA for exploratory analysis of domestic violence cases using this dataset. In section 6 the results of the domain exploration are validated. Finally, section 7 presents a number of concluding remarks. 2 Domestic violence discovery According to the U.S. Office on Violence against Women, domestic violence is a “pattern of abusive behavior in any relationship that is used by one partner to gain or maintain power and control over another intimate partner” [1]. Domestic violence can take the form of physical violence, which includes biting, pushing, maltreating, stabbing or even killing the victim. Physical violence is often accompanied by mental or emotional abuse, which includes insults and verbal threats of physical violence towards the victim, the self or others, including children. Domestic violence occurs all over 4 the world, in various cultures [2] and affects people throughout society, irrespective of economic status [3]. 2.1 Current situation The XPol database – the database of the Amsterdam-Amstelland police – contains most of the documents with regard to criminal offences. Documents related to certain types of crime receive corresponding labels. It is of the utmost importance that a correct label is assigned to each of the filed police reports. First, there are some legal consequences. If the police judged an incident to be domestic violence, the public prosecutor can accuse the offender of committing a domestic violence crime. This is taken into account by the judge as an aggravating circumstance, often resulting in a more severe penalty. Second, police officers will be able to better assess new incidents between the perpetrator and the victim, resulting in a more effective way of tackling the problem. Finally, if a domestic violence label was incorrectly assigned to a case, this will result in a waste of the valuable time of the police officers assigned to the case. Immediately after the reporting of a crime, police officers are given the possibility to judge whether or not it is a domestic violence case. If they believe it is, they can indicate this by assigning the label “domestic violence” to the report. However, not all domestic violence cases are recognised as such by police officers. This may have several reasons, for example, because of a lack of training, a lack of prior experience or new types of domestic violence occurring. As a consequence, many documents are lacking the appropriate label, which put on the agenda the need for a more efficient and effective case triage software program to automatically filter out suspicious cases for in-depth, manual inspection and classification. The in-place case triage system has been configured to filter out these reports for in- depth manual inspection and classification, with the aim of substantially reducing the number of domestic violence cases that are not recognised as such. It retrieves suspicious cases that lack the label of domestic violence and sends them back to the data quality management team. At present, each case retrieved by the in-place case triage system is subjected to an in-depth manual inspection by one of the co-workers of the quality control department. If analysis reveals that a case was wrongly classified as non-domestic violence, it is sent back to the police officer responsible for the case, who is obliged to re-examine and reclassify the police report. It is obvious that this is a very time-consuming and, by 5 consequence, costly procedure. Given that it takes an individual at least five minutes to read and classify a case, it is clear that more accurate triage will result in major savings. Currently the triage is based on either one or both of the following two criteria being met. The first criterion is whether the perpetrator and the victim live at the same address. The second criterion is whether any or a combination of the following expressions appear in the case documents: “ex- boyfriend”, “ex-girlfriend”, “ex-husband”, “ex-wife”, “domestic”, “stalk”, “lived together”, “live together”, “son and scared”, “child and scared”, “child and threat”, “son and threat”, “daughter and threat” or “daughter and scared”. Fig. 1. Current domestic violence reporting procedure A summary of the current domestic violence reporting procedure is displayed in Figure 1. There are several problems associated with this process. First, recent audits have confirmed that many of the retrieved cases are wrongly selected for in-depth manual inspection. Going back to 2006, the system retrieved 1157 cases, 80% of which actually turned out to be non-domestic violence cases. For example, going back to 2007, the triage system retrieved 1091 of such cases in which the victim made 6 a statement to the police. Second, because of a lack of manpower the data management quality team was not able to analyse each retrieved police report. Third, audits of the police databases revealed that not all domestic violence cases lacking the appropriate label were retrieved by the case triage system. Fourth, no actions have yet been undertaken to address the issue of the filed reports that were wrongly classified as domestic violence. 2.2 Previous attempts to resolve situation Previous attempts have mainly focused on developing a machine learning classifier that automatically classified cases as domestic or as non-domestic violence. Previously developed systems were mainly multi-layer perceptions that were trained on a dataset consisting of cases that were labelled by police officers as domestic or as non-domestic violence. These systems did not provide any insight into the problem, since they are black-boxes and their performance was around 80% only [31]. As a consequence, these systems never made it into operational policing practice. We found that a critical error was that the developers never performed an in-depth exploration of the data. They overlooked the complexity of the notion of domestic violence, were unaware that different people have different visions about the nature and scope of it and did not pay attention to niche cases. Moreover, the correctness of the labels assigned to cases by police officers was never verified. We found that different police officers regularly assigned different labels to the same situation. Finally, the developers did not dispose of a high-quality domain-specific thesaurus that contained sufficient discriminant terms for accurately classifying cases. 3 FCA knowledge discovery process This section introduces the main ideas of FCA and how it was used during the knowledge discovery process. According to R.S. Brachman and T. Anand [29], much attention and effort has been focused on the development of data mining techniques, but only a minor effort has been devoted to the development of tools that support the analyst in the overall discovery task. They argue for a more human-centred approach. Human-centred KDD refers to the constitutive character of human interpretation for the discovery of knowledge, and stresses the complex, interactive process of KDD as 7 being led by human thought. In most real-world knowledge discovery applications, an indispensable part of the discovery process is that the analyst explores and sifts through the raw data to become familiar with it and to get a feel for what the data may cover. Often an explicit specification of what one is looking for only arises during an interactive process of data exploration, analysis and segmentation. R.S. Brachman et al. [30] introduce the notion of data archeology for KDD tasks in which a precise specification of the discovery strategy, the crucial questions and the basic goals of the task have to be elaborated during an unpredictable exploration of the data. Data archeology can be considered as a highly human-centred process of asking, exploring, analysing, interpreting and learning by interacting with the underlying database. Comprehensible support should be provided to the analyst during the KDD process. According to Brachman et al. [29] this should be embedded into a knowledge discovery support environment. How the process of human-centred KDD can be supported by Formal Concept Analysis (FCA) was for the first time investigated by Stumme et al. [12]. Smyth et al. [33] already stated that the algorithm designer and the scientist should be able to bring in prior knowledge so the data mining algorithm does not just rediscover what is already known. Moreover, the scientist should be able to “ get inside” and “ steer” the direction of the data mining algorithm. FCA fulfils these requirements. Starting from initial knowledge on the problem area, it provides the user with a visual display of the relevant concepts available in the dataset and their relationships. Additionally, the user can visually interact with the concept lattice and thereby steer the knowledge discovery process. What makes FCA into an especially appealing technique for knowledge discovery in databases is that it meets the important requirement stated by, amongst others, Fayyad et al. [34] that data mining should be primarily concerned with making it easy, convenient and practical to explore very large databases for organizations and users with vast amounts of data but without years of training as data analysts. FCA offers the user an intuitive visual display of different types of structures available in the dataset and guides the user in the exploration of the dataset. This end-user-friendly interface also makes the data mining more transparent to the user. When compared to other, more traditional, techniques such as associates rules, FCA has a larger explanatory power because of its underlying non-hierarchical structure [36]. While traditional 8 association rules are flat, FCA provides an order of significance, which makes its representation richer and more intuitive to use. 3.1 FCA essentials Formal Concept Analysis is a recent mathematical technique that can be used as an unsupervised clustering technique [11, 13]. Police reports containing terms from the same term clusters are grouped in concepts. The starting point of the analysis is a database table consisting of rows M (i.e. objects), columns F (i.e. attributes) and crosses T M F⊆ × (i.e. relationships between objects and attributes). The mathematical structure used to represent such a cross table is called a formal context (T, M, F). An example of a cross table is displayed in Table 1. In this table reports of domestic violence (i.e. the objects) are related (i.e. the crosses) to a number of terms (i.e. the attributes); here a report is related to a term if the report contains this term. The dataset in Table 1 is an excerpt of the one we used in our research. Given a formal context, FCA then derives all concepts from this context and orders them according to a subconcept-superconcept relation, which results in a line diagram (a.k.a. lattice). Table 1. Example of a formal context kicking dad hits me stabbing cursing scratching maltreating report 1 X X X report 2 X X X report 3 X X X X X report 4 X report 5 X X The notion of concept is central to FCA. The way FCA looks at concepts is in line with the international standard ISO 704, which formulates the following definition. A concept is considered to be a unit of thought constituted of two parts: its extension and its intension, [14, 16]. The extension consists of all objects belonging to the concept, while the intension comprises all attributes shared by those objects. Let us illustrate the notion of concept of a formal context using the data in Table 1. For a set of objects O M⊆ , the common features, written ( )Oσ , can be identified via the following formula: ( ) { | : ( , ) }A O f F o O o f Tσ= = ∈ ∀ ∈ ∈ 9 Take the attributes that describe report 5 in Table 1, for example. By collecting all reports of this context that share these attributes, we get to a set O M⊆ consisting of reports 2, 3 and 5. This set O of objects is closely connected to set A consisting of the attributes “ cursing” and “ scratching.” ( ) { | : ( , ) }O A i M f A i f Tτ= = ∈ ∀ ∈ ∈ That is, O is the set of all objects sharing all attributes of A, and A is the set of all attributes that are valid descriptions for all the objects contained in O. Each such pair (O, A) is called a formal concept (or concept) of the given context. The set ( )A Oσ= is called the intent, while ( )O Aτ= is called the extent of the concept (O, A). There is a natural hierarchical ordering relation between the concepts of a given context that is called the subconcept-superconcept relation. 1 1 2 2 1 2 2 1( , ) ( , ) ( )O A O A O O A A⊆ ⇔ ⊆ ⇔ ⊆ A concept d 1 1( , )O A= is called a subconcept of a concept e 2 2( , )O A= (or equivalently, e is called a superconcept of a concept d) if the extent of d is a subset of the extent of e (or equivalently, if the intent of d is a superset of the intent of e). For example, the concept with intent “ cursing” , “ scratching” and “ stabbing” is a subconcept of a concept with intent “ cursing” and “ scratching.” With reference to Table 1, the extent of the latter is composed of reports 2, 3 and 5, while the extent of the former is composed of reports 2 and 3. The set of all concepts of a formal context combined with the subconcept-superconcept relation defined for these concepts gives rise to the mathematical structure of a complete lattice, called the concept lattice of the context, which is made accessible to human reasoning by using the representation of a (labelled) line diagram. The line diagram in Figure 1, for example, is a compact representation of the concept lattice of the formal context abstracted from Table 1. The circles or nodes in this line diagram represent the formal concepts. It displays only concepts that describe objects and is therefore a subpart of the concept lattice. The shaded boxes (upward) linked to a node represent the attributes used to name the concept. The non-shaded boxes (downward) linked to a node represent the objects used to name the concept. The information contained in the formal context of Table 1 can be distilled from the line diagram in Figure 1 by applying the following reading rule: an object “ g” is 10 described by an attribute “ m” if and only if there is an ascending path from the node named by “ g” to the node named by “ m” . For example, report 5 is described by the attributes “ cursing” and “ scratching” . Fig. 2. Line diagram corresponding to the context from Table 1 Retrieving the extension of a formal concept from a line diagram such as the one in Figure 2 implies collecting all objects on all paths leading down from the corresponding node. In this example, the objects associated with the third concept in row 3 are reports 2 and 3. To retrieve the intension of a formal concept, one traces all paths leading up from the corresponding node in order to collect all attributes. In this example the third concept in row 3 is defined by the attributes “ stabbing” , “ cursing” and “ scratching” . The top and bottom concepts in the lattice are special: the top concept contains all objects in its extension, whereas the bottom concept contains all attributes in its intension. A concept is a subconcept of all concepts that can be reached by travelling upward. This concept will inherit all 11 attributes associated with these superconcepts. Note that the extension of the concept with attributes “ kicking” and “ dad hits me” is empty. This does not mean that there is no report that contains these attributes. However, it does mean that there is no report containing only these two attributes. 3.2 Human-centred knowledge discovery with FCA In contrast to most data mining algorithms, the discovery process using FCA is human-centred. It is definitely not a black-box that runs and optimises without intervention beyond specifying initial model choices and parameters. During the mining process two persons, an exploratory data analyst and a domain expert, were the driving force behind the exploration and collaborated intensively. There was a continuous process of iterating back and forth between the FCA lattices and the police reports. This knowledge discovery process is summarised in Figure 3. It is an abstract description of the methodology that is displayed here, but this process will be exemplified in the results section. Fig. 3. Abstract human-centered FCA knowledge discovery process The process of using FCA for exploratory data analysis consists basically of iteratively applying the following process. A lattice is constructed by the exploratory data analyst based on the domain expert’s prior knowledge of the problem area, the police reports contained in the dataset and the terms contained in the thesaurus. The lattice provides a reduced search space to the domain expert, who then visually inspects and analyses this lattice paying special attention to anomalies and counter-intuitive facts. The latter provide a clear guideline to the exploratory data analyst and the domain expert in order for them to pursue their data exploration. The obtained results, together with the relevant prior 12 knowledge of the domain expert, are then incorporated into the existing visual representation, resulting in a new lattice. The FCA lattice can be considered as a knowledge browser. Our contention is that it allows for an effective interaction between the human actors and the underlying information. The focus of the use of the FCA technique is on truly gaining incremental insight into the problem area by optimally incorporating prior domain knowledge in learning cycles. This insight encompasses an enrichment as well as a validation of the correctness and the practical usefulness of existing prior knowledge. Additionally, FCA is used to enrich and refine the domain-specific thesaurus. This thesaurus plays a key role in the incremental knowledge discovery germane to our research. FCA is also used to discover missing values and inconstancies from police reports. Finally, FCA is used to investigate some important, significant aspects of operational policing practice concerning domestic violence cases and to discover accurate and comprehensible classification rules. Each of these aspects of the process will be described in more detail in section 5, where we comment on the empirical analysis and results. 4 Dataset The dataset we report on in this paper consists of a selection of 4814 police reports describing a whole range of violent incidents from the year 2007. The domestic violence cases for that period are a subset of this dataset. The 1091 cases selected by the in-place case triage system for 2007 are a subset of this dataset too. This latter selection came about by, amongst other things, filtering from a larger set those police reports that did not contain the reporting of a crime by a victim, which is necessary for establishing domestic violence. This happens, for example, when a police officer is sent to an incident and later on writes a report in which he/she mentions his/her findings, while the victim has not made an official statement to the police. The follow-up reports referring to previous cases were also removed from the initial set of reports. Ultimately, this gave rise to a set of 4814 reports that were used as input for our investigation. From these reports, the person who reported the crime, the suspect, the persons involved in the crime, the witnesses, the project code and the statement made by the victim 13 to the police were extracted. Of the 4814 reports, 1657 were classified as domestic violence; the others were not. An example of a report is displayed in Figure 4. Title of incident Violent incident xxx Reporting date 26-11-2007 Project code Domestic violence against seniors (+55) Crime location Amsterdam Keizersgracht yyy Suspect (male) Suspect (18-45yr) zzz Address Amsterdam Keizersgracht yyy Involved (male) Involved (18-45yr) Neighbours Address Amsterdam Keizersgracht www Victim (female) Victim (older than 45yr) uuu Address Amsterdam Keizersgracht vvv Reporting of the crime Last night I was attacked by my husband. I was watching television in the living room when he suddenly attacked me with a knife. I fell on the floor. Then he tried to kick me in my stomach. I tried to escape through the back door while I was yelling for help. I ran to the neighbours for help. They called the emergency services. Meanwhile my son ran away. My leg was bleeding; my head was bouncing, etc. Fig. 4. Example police report The validation set consists of a selection of 4738 cases describing a whole range of violent incidents from the year 2006 where the victim made a statement to the police. Again, the follow-up reports were first removed. Of these 4738 cases 1734 were classified as domestic violence by police officers. In 2006 the in-place case triage system retrieved 1157 police reports containing a statement made by the victim that had to be manually classified by police officers. 318 were classified as domestic violence, while 839 were classified as non-domestic violence. In addition to the set of reports, we had an initial thesaurus – a collection of 123 domain-specific terms – at our disposal, which was obtained by performing frequency analyses on the set of police reports. The terms that occurred most often were retrieved and added to the initially empty thesaurus. Each police report was then searched for each of these terms. The result was a cross table in which a cross indicated that the corresponding police report contained the corresponding term. 14 5 Analyses and results In this section, we showcase the possibilities of FCA as a knowledge discovery and knowledge enrichment technique. The knowledge discovery process using FCA is summarised and displayed in Figure 5. Fig. 5. Detailed human-centered knowledge discovery process using FCA It is clear that the process displayed in Figure 5 contains an iterative learning loop. Initially, an FCA lattice is constructed based on expert prior knowledge, the terms contained in the thesaurus and the police reports contained in the dataset. Then, the FCA lattice is analysed by the exploratory data analyst and domain expert. Based on the results obtained through the analysis process, which is described in the subsequent paragraphs and demonstrated in detail in the next subsections, a new lattice can be constructed. The FCA lattices are used as an instrument to discover new case labelling rules and to enrich, test and refine expert prior knowledge. Furthermore, the FCA lattices are used to browse and annotate the collection of police reports and efficiently select representative reports for in-depth manual inspection. 15 The first major aspect of the process consists in searching these reports for new attributes that can be used to discriminate between the domestic and non-domestic violence reports or that may lead to an enrichment of existing domain knowledge. New referential terms were not acquired and selected using a term extractor, but they were obtained by carefully reading some representative reports and then selecting relevant terms as attributes. We built in the necessary validation mechanisms such as using synonym lists, spelling checking, etc. to ensure the completeness of the thesaurus. During the research the thesaurus was under constant evolution: when new terms and concepts were discovered, the terms were added to the thesaurus. Because of the large number of police reports in the dataset, it was not possible to visually analyse concept lattices containing more than 14 attributes. Therefore, terms with a similar semantic meaning or referring to the same domain concept were clustered by the domain experts. When these term clusters were used to create an FCA lattice, they were considered as attributes. This approach ensured that the thesaurus remained at all times a reflection of the already gained knowledge. The second major aspect of the process consists of verifying the correctness of the labels assigned by police officers to the selected cases and searching the reports for missing values and inconsistencies. This allowed for the discovery of faulty case labellings and situations that were often not recognised by police officers as domestic or as non-domestic violence. This information was used by the data quality management team to significantly improve the quality of the data contained in the police databases and to improve the way police officers handle domestic violence cases. The information was also useful for the domestic violence programme manager to improve the training of police officers. We also found some regularly occurring confusing situations that could not be uniquely classified as domestic or non-domestic violence based on the domestic violence definition. These situations were presented to the programme manager and were used to enrich, improve and refine the concept and definition of domestic violence. The third major aspect of the process consists in discovering accurate and comprehensible case labelling rules to automatically classify cases as domestic or as non-domestic violence. In the past this turned out to be impossible. We found that this was largely due to the incorrect labels assigned by police officers to cases, to the vagueness of the domestic violence definition and to the lack of a high- 16 quality thesaurus. We managed to resolve many of these problems using FCA, resulting in a set of highly accurate and comprehensible classification rules. All these different aspects of the process, which have only been briefly introduced so far, are discussed more extensively in the next sections. 5.1 Domain exploration starting from expert prior knowledge In this section it is illustrated how we used prior knowledge to start and guide the exploration of the data. We based our initial lattice on the domestic violence definition, by clustering the terms contained in the thesaurus into term clusters associated with one of the two components of the definition (i.e. prior knowledge incorporation). The definition of domestic violence employed by the police organiza- tion of the Netherlands is as follows: “ Domestic violence can be characterised as serious acts of vi- olence committed by someone in the domestic sphere of the victim. Violence includes all forms of physical assault. The domestic sphere includes all partners, ex-partners, family members, relatives and family friends of the victim. The notion of family friend includes persons that have a friendly relation- ship with the victim and (regularly) meet with the victim in his/her home [6].” We intended to verify whether a report can be classified as domestic violence by checking it for the occurrence of one or more terms related to each of the two components of the domestic violence defi- nition. That is, a case can be labelled as domestic violence if the following two conditions are fulfilled. First, a criminal offence has occurred. This may range from verbal threats over pushing and kicking to even killing the victim. To verify whether a criminal offence has occurred, the report is searched for terms such as “ hit” , “ stab” and “ kick” . These terms are grouped into the term cluster “ acts of vio- lence” . Second, a person in the domestic circle of the victim is involved in the crime. It should be noted that a report is always written from the point of view of the victim and not from the point of view of the officer. A victim always adds “ my” , “ your” , “ her” and “ his” when referring to the persons involved in the crime. Therefore, the report is searched for terms such as “ my dad” , “ my mom” and “ my son” . These terms are grouped into the term cluster “ family members” . The report is also searched for terms such as “ my ex-boyfriend” , “ my ex-husband” , and “ my ex-wife” . These terms are grouped into the term cluster “ ex-partners” . Furthermore, the report is searched for terms such as “ my nephew” , “ her uncle” , “ my aunt” , “ my step-father” and “ his step-daughter” . These terms are grouped 17 under the term cluster “ relatives.” Then the report is searched for terms such as “ family friend” and “ co-occupant” . These terms are grouped into the term cluster “ family friends” . Reports that were assigned the label “ domestic violence” have been classified as such by police of- ficers. The remaining reports were classified as non-domestic violence. This results in the lattice dis- played in Figure 6. Fig. 6. Initial lattice based on the police reports from 2007 18 From an initial inspection of the lattice in Figure 6 it quickly became clear that a lattice containing only term clusters based on the starting definition of domestic violence would not discriminate sufficiently between domestic and non-domestic violence reports (i.e. knowledge enrichment). Many non-domestic violence reports seemed to also contain terms attributed to one or more of the term clusters (i.e. prior knowledge validation). Still, some interesting findings emerged from this lattice and triggered further investigation. These findings are discussed in the next section. The lattice structure also made it possible for us to discover the most frequently occurring types of domestic violence cases for 2007. These are summarised in Table 2. Table 2. Most frequently occurring types of domestic violence in 2007 % of all domestic violence cases of 2007 “ Acts of violence” and “ family members” and “ partners” 25% “ Acts of violence” and “ family members” and “ partners” and “ ex-persons” 16% “ Acts of violence” and “ family members” and “ ex-persons” 15% “ Acts of violence” and “ family members” 10% “ Acts of violence” and “ family members” and “ partners” and “ relatives” 6% “ Acts of violence” and “ partners” 5% 5.2 Prior knowledge testing and referential term discovery In this section it is demonstrated how we used prior knowledge to guide the exploration of the data. In contrast to what the domain expert initially thought, not all cases labelled as domestic violence by police officers contained terms associated with the two components of the definition (i.e. prior knowledge testing). This led to the discovery of cases that were assigned a wrong label by police officers (i.e. detection of faulty case labellings), to new domain-specific terms that were lacking in the original thesaurus (i.e. referential term discovery) and to a labelling error that was regularly made by police officers (i.e. improvement of training of police officers). Table 3. Interesting observations from the lattice in Figure 6 Non-domestic violence Domestic violence No “ acts of violence” 67 42 No “ acts of violence” and one or more of the persons clusters 61 19 19 Only “ acts of violence” 879 64 As can be seen from Table 3, a total of 61 (i.e. 42 and 19) domestic violence cases did not contain a term from the “ acts of violence” term cluster. Of these 61 cases 19 contained a term from one of the clusters containing terms referring to a person in the domestic sphere of the victim. After in-depth manual inspection of these 19 cases, it turned out that they contained other violence terms, such as “ abduction” , “ strangle” and “ deprivation of liberty” , which were lacking in the initial thesaurus. The remaining 42 cases, on the other hand, turned out to be wrongly classified as domestic violence. Interestingly, some 28% (i.e. 879) of the non-domestic violence reports only contain terms from the “ acts of violence” cluster, while there are only 64 domestic violence reports in the dataset that share that characteristic. Manual inspection, again, revealed that more than two thirds of these reports were wrongly classified as domestic violence. For some unknown reason, police officers regularly seem to misclassify burglary, car theft, bicycle theft and street robbery cases as domestic violence. Therefore, terms such as “ street robbery” , burglary” and “ car theft” were combined into a new term cluster called “ burglary cases” . 5.3 Term clustering and concept discovery In this section it is shown how new domain-specific terms, discovered by careful analysis of police reports, and terms with a similar semantic meaning, proposed by the domain expert, were clustered together in term clusters (i.e. term clustering). These term clusters led to the discovery of new concepts that were lacking in the domain expert’s conception of the problem area (i.e. concept discovery). Additionally, two new term clusters based on prior knowledge were introduced (i.e. prior knowledge incorporation). These new term-clusters were used to construct the second FCA lattice. When browsing a sample of the remaining police reports, we spotted some interesting terms that led to the discovery of two new and important concepts that were lacking in the domain expert’s conception of the problem area. The reports contained terms such as “ I had a relationship with” , “ relational problems” and “ marriage problems” . These terms typically refer to the concept of a broken relationship, which is why they were brought together into the cluster “ relational problems” . A 20 distinction was made between a broken relationship and an ongoing relationship. Terms such as “ I have a relationship with” and “ live together” were brought together in the cluster “ in a relationship” . According to the literature, domestic violence is a phenomenon that mainly occurs inside the house [4, 5, 6, 21]. Therefore, an attribute called “ private locations” was introduced. This term cluster contained terms such as “ bathroom” , “ living room” and “ bedroom” . An attribute called “ public locations” was also introduced. To summarise, although the lattice in Figure 5 could not be used to effectively distinguish domestic violence reports from non-domestic violence reports, it could be used to detect cases that were wrongly classified as domestic violence. Also, it helped in discovering new attributes that turned out to be missing in the user’ s understanding of the problem area. The redefined lattice structure, taking into account the above analyses, is displayed in Figure 7. In order to keep the lattice comprehensible, the terms belonging to the clusters “ family members” , “ relatives” , “ partners” , “ ex-partners” and “ family friends” have been lumped into a cluster “ persons” . 21 Fig. 7. First refined lattice based on the police reports from 2007 5.4 Detecting faulty case labellings and confusing situations In this section, we demonstrate how FCA was used to detect faulty case labellings and situations that are confusing to police officers. This was used to improve the training of police officers, to enrich and refine the domestic violence definition and to improve the quality of the data contained in the police databases. We also discovered new referential terms and clustered them based on their semantic meaning, leading to a further enrichment of the thesaurus and the existing domain knowledge. 22 It should be clear from the lattice in Figure 7 that the terms contained in the cluster “ relational problems” tend to be associated with domestic violence cases. Some of the more interesting observations from this lattice are displayed in Table 4. Table 4. Results from the lattice in Figure 7 Non-domestic violence Domestic violence “ relational problems” 58 365 “ private locations” 1340 1365 “ public locations” 1015 505 Apparently, only 58 non-domestic violence reports contained one or more terms from the “ relational problems” cluster. Further investigation revealed that a startling 95% of these cases had been wrongly classified as non-domestic violence. Moreover, about 70% of these cases had in common that a third person made a statement to the police for someone else. For example, one case described a father who made a statement to the police about the sexual abuse of his daughter by her stepfather. This is a clear case of domestic violence. But since it was not the victim who made the statement to the police, the police officer did not recognise it as such. Analysis of the remaining 30% of these misclassified cases led to the discovery of a new and important concept that was initially lacking from the domain expert’ s understanding of domestic violence. Many of the reports turned out to contain terms such as “ I was attacked by the new boyfriend of my ex girlfriend” and “ I was maltreated by the new girlfriend of my ex boyfriend” . These terms were grouped into the cluster “ attack by new friend of ex-person” . Police officers and policy makers confirmed that this type of situation was to be seen as domestic violence, mainly because the perpetrator often aims at emotionally hurting the ex-partner. Consequently, the expectation was for the terms contained in this cluster to frequently occur in domestic violence reports. However, this turned out to be incorrect. It became clear from the investigation that this type of situation in general was very confusing to police officers. A quick scan revealed that more than 50% of police officers actually had trouble with this. The ensuing investigation and discussions with police officers and policy makers revealed that this situation needed to be addressed during the training of police officers. Several interesting cases like the previous one were picked up during the data exploration. All of them gave rise to a clearer insight into the nature of domestic violence. 23 5.5 Prior knowledge incorporation and testing In this section we demonstrate how expert prior knowledge was incorporated into the FCA knowledge discovery process. It is also made clear how we used FCA to verify the correctness and the practical usefulness of this prior knowledge. Most of the domestic violence cases under scrutiny (1365 cases or 82%) contained one or more terms from the “ private locations” term cluster. However, 1340 (42%) of the non-domestic violence cases also contained one or more terms from this same term cluster. In addition, a hypothesis that was formulated prior to the data exploration was that almost no domestic violence case was expected to have taken place on the street. Surprisingly, this hypothesis was proven incorrect by the data. In about one-fourth of the domestic violence cases there had been an incident at a public location. While scrutinising these police reports, we discovered that this was often the case when ex-partners were involved. It became apparent that it was not possible to distinguish domestic from non-domestic violence reports by means of the type of locations mentioned in the reports. Combining the clusters “ private locations” and “ public locations” with clusters such as “ family members” or “ ex-persons” , for example, did not yield the expected results in terms of discriminatory power. 5.6 Definition refinement: niche cases In this section we focus on how FCA was used to enrich and refine the operationally employed domestic violence definition. Using FCA, we discovered multiple niche cases, which were presented to the domestic violence programme manager. This resulted in an enrichment of the domain knowledge, a refinement of the domestic violence definition and an improvement of the training of police officers. We continued our knowledge discovery exercise in search of additional attributes to help us distinguish domestic violence from non-domestic violence reports. We noticed that in a large number of the domestic violence cases (416 cases or 28%) the perpetrator and the victim happened to live at the same address at the time the victim made their statement to the police. Most of these cases (379 cases or 91%) were classified as domestic violence. When studying the remaining 37 non-domestic violence cases more carefully, we found, much to our surprise, that the perpetrator and the victim often lived together in the same institution (e.g. a youth institution, a prison or a retirement home). It 24 turned out that of the 41 cases where the perpetrator and the victim lived in the same institution only 30 actually had been classified as cases of domestic violence. This finding brought about a lively discussion amongst the police officers of the Amsterdam police force. More importantly, it exposed the discord amongst police officers on how to classify such cases. We took note of all their reflections and presented them to the board members responsible for the domestic violence policy. After intensive debate the following classification guidelines, displayed in Table 5, were obtained. Table 5. Classification guidelines for incidents involving inhabitants of the same institution Perpetrator Victim Classification Caretaker Inhabitant Domestic violence Inhabitant Caretaker Non-domestic violence Inhabitant younger than 18y Inhabitant younger than 18y Domestic violence Inhabitant older than 18y Inhabitant older than 18y Non-domestic violence Inhabitant of prison older than 18y Inhabitant of prison older than 18y Individual evaluation Inhabitant older than 18y Inhabitant younger than 18y Domestic violence Inhabitant younger than 18y Inhabitant older than 18y Individual evaluation The presence or absence of a dependency relationship between the perpetrator and the victim was in the end the decisive factor for classifying a case as either domestic or as non-domestic violence. The non-domestic violence cases where the perpetrator and the victim lived at the same address and were not inhabitants of an institution turned out to be wrongly classified as non-domestic violence. Therefore, a new attribute called “ institution” was introduced. 5.7 Missing values detection In this section, we demonstrate how FCA was used to detect missing values and inconsistencies in police reports. We also show how we exposed inefficiencies in the overall domestic violence policy employed by the police, using FCA.. Another interesting finding that emerged in our search for novel and potentially interesting classification attributes was that some 34% of the reports (1623 cases) did not mention a suspect. According to the domestic violence definition (which specifies that the perpetrator must belong to the domestic circle of the victim), the offender has to be known in domestic violence cases. Naturally, we 25 had assumed that these reports described non-domestic violence cases. Nevertheless, when looking into these cases, we found that 181 of them turned out to describe domestic violence cases after all. Analysis revealed that this was a result of police officers’ rather haphazard ways of registering victims for these cases. Apparently, while some officers immediately registered a suspect at the moment the victim mentioned this person as a suspect, others preferred to first interrogate them before casting the label of suspect. In the latter cases, the person then would just be added to the list of persons who were said to be involved in or witnessed the crime. Because such lists included friends, family members or bystanders, they could potentially be very extensive and diverse, which is why suspects easily got lost in these lists. When we inquired about the proper policy regarding the labelling of suspects, we were told there simply was none. Our analysis made a strong case for the need of such a policy. In the end, the quick-win proposal that could be implemented to solve this issue involved a relatively simple change to the registration software: an additional data entry field would need to be introduced for police officers to register the persons that were mentioned by the victim as offenders. Classification of police reports can only be performed on the basis of comprehensible and correct rules that do not inflate the false negative rate, while minimizing the false positive rate. Automatically assigning the non-domestic violence label to a case that does not mention a suspect is thus unacceptable because of the high false negative rate. Nevertheless, we found out that some 44% of the reports (711 cases) that lacked a labelled suspect did contain a description of the actual suspect. Of these 711 cases, only 16 reports were classified as domestic violence. After studying these 16 reports, we discovered that the majority of them were wrongly classified as domestic violence. Classifying cases as non-domestic violence because they lack a labelled suspect and contain a description of the suspect was thus acceptable. All of this newly discovered knowledge can once again be added to the lattice in Figure 7. When we introduce the attributes “ same address” , “ no suspect” and “ description of suspect” to this lattice, this results in the refined lattice structure displayed in Figure 8. 26 Fig. 8. Second refined lattice based on the police reports from 2007 The lattice in Figure 8 proved to be of much more use for discriminating domestic from non-domestic violence reports. We summarised some of the most interesting findings embedded in that lattice structure in Table 6. Table 6. Results from the lattice in Figure 8 Non-domestic violence Domestic violence Acts of violence and same address 37 379 Acts of violence and no suspect and description of suspect 695 16 Acts of violence and no suspect 1442 181 27 5.8 Discovering accurate and comprehensible classification rules In this section we focus on how we used FCA to discover accurate and comprehensible classification rules. We also illustrate how FCA can play a key role in detecting faulty case labellings. While further exploring the domestic violence reports, it became apparent that in many cases the victim made statements such as “ I want to institute legal proceedings against my husband” and “ I want to institute legal proceedings against my brother” . These sentences were brought together into the cluster “ legal proceedings against domestic sphere” . Another type of phrasing that was regularly used by victims of domestic violence was, for example, “ the crime was committed by my dad” or “ the crime was committed by my ex-boyfriend” . These sentences were brought together into the cluster “ committed by domestic sphere” . Yet another type of wording that was also frequently used by a victim was phrases such as “ I was maltreated by my husband” and “ I was threatened by my ex- partner” . These sentences in turn were brought together into the cluster “ threatened by domestic sphere” . Finally, neighbourhood quarrels (non-domestic violence) often made reference to phrases such as “ I want to institute legal proceedings against my neighbour” and “ committed by the man next door” , so these sentences were combined into the cluster “ neighbours” . Thus, the lattice was further refined and the result is displayed in Figure 9, with some of the most interesting facts summarised in Table 7. Table 7. Results from the lattice displayed in Figure 9 Non-domestic violence Domestic violence “ legal proceedings against domestic sphere” 19 266 “ committed by domestic sphere” 5 81 “ threatened by domestic sphere” 4 98 “ neighbors" 67 5 After browsing the 19 non-domestic violence cases in which the victim used one or more terms from the “ legal proceedings against domestic sphere” cluster, it turned out that these reports should have been classified as domestic violence. The same observation was made when the 5 non-domestic violence reports containing a term from the “ committed by domestic sphere” cluster and the 4 non- domestic violence cases containing a term from the “ threatened by domestic sphere” cluster were analysed. In-depth investigation of the 5 domestic violence cases in which a term from the 28 “ neighbours” cluster occurred, showed that these reports should have been classified as non-domestic violence. Fig. 9. Third refined lattice based on the police reports from 2007 5.9 Operational validation In this section, we clarify how FCA was used for the validation of some aspects of operational policing practice. For some specific situations it was verified whether police officers disposed of sufficient knowledge about the problem area to recognise these cases as domestic violence. Some very important special domestic violence situations were considered, including incest and honour-related violence. For the first type of situation, reports were searched for terms such as “ incest” and “ sexual abuse by my father” . For the second type of situation, reports were searched for terms such as “ marriage of convenience” and “ marry off” . The resulting lattice after incorporating these special cases is displayed in Figure 10. Table 8 summarises the classification. Table 8. Results from the lattice displayed in Figure 10 Non-domestic violence Domestic violence “ incest” 7 8 “ honor-related violence” 2 18 29 Careful inquiry into these cases taught us that police officers regularly misclassified incest cases as non-domestic violence. On the other hand, even for insiders it was quite surprising to observe how almost all honour-related violent incidents ended up being correctly classified as domestic violence. The latter was probably attributable to the intensive sensitisation campaigns organised to inform police officers of this important societal problem. Fig. 10. Fourth refined lattice based on the police reports from 2007 6 Validation experiment In this section we elaborate on the run-time power of the distilled knowledge. We start by mapping the proposed lattice structures obtained during discovery of the 2007 police data on the police reports from 2006. We demonstrate that the findings obtained through in-depth analysis of the 2007 police data are also valid for the police reports from 2006. Then, we apply the discovered knowledge to automatically classify the output of the in-place case triage system. Finally, we demonstrate how the newly discovered knowledge was used to detect and reclassify filed reports that were incorrectly labelled by police officers. 30 For the classification rules discovered in section 5, we verified how many domestic and non- domestic violence reports correspond to each rule. The rules and these counts are represented in Table 8. For the first eight rules, the non-domestic violence cases turned out to be incorrect labellings performed by police officers. For rules 9 up to 13, the domestic violence cases turned out to be incorrect labellings performed by police officers. Using rule 14, we found that in 160 cases that were classified as domestic violence by police officers a formally labelled suspect was lacking. Table 8. Discovered knowledge applied to police reports from 2006 Non-domestic violence Domestic violence Domestic violence rules 1. “ legal proceedings against domestic sphere” 24 237 2. “ committed by domestic sphere” 9 101 3. “ threatened by domestic sphere " 11 106 4. “ incest” 0 3 5. “ attack by new friend of ex-person” 6 12 6. “ relational problems” 61 364 7. “ same address” and not in “ institution” 23 299 8. “ honor-related violence” 1 16 Non-domestic violence rules 9. “ burglary cases” 32 24 10. “ neighbors” 13 6 11. “ no suspect” and “ description of suspect” 504 15 12. no “ acts of violence” 30 38 13. “ acts of violence” and no “ persons” 865 94 Data quality check extra 14. “ no suspect” 1074 160 For classification, the protocol is as follows. When a case comes in for labelling, the first step consists in verifying whether one of the domestic violence rules is satisfied. If this is the case, the case is classified as domestic violence. If the “ no suspect” or one of the non-domestic violence rules turns out to be also satisfied, the case is sent to the data quality management team, because there probably is a data quality problem. Otherwise, it is verified whether one of the non-domestic violence rules is satisfied. If this is the case, the case is classified as non-domestic violence. Otherwise, the case is left unclassified. By applying the first thirteen rules in Table 8, 50% of the dataset of 2006 could be automatically correctly classified). A further validation encompassed the application of the discovered knowledge to automatically classify the output of the in-place case triage system. For example, going back to 2006, the system retrieved 1157 cases, 80% of which actually turned out to be non-domestic violence cases. It is to deal 31 with these shortcomings in the current system that the rules in Table 8 will prove to be extremely useful. Some 9% of the cases contained terms from the “ committed by domestic sphere” , “ threatened by domestic sphere” or “ legal proceedings against domestic sphere” clusters and could be automatically classified as domestic violence. About 10% of the cases contained one or more terms from the “ relational problems” cluster and could for that reason be automatically classified as domestic violence. A further 11% of the retrieved cases could be classified as domestic violence simply because the perpetrator and the victim lived at the same address, which was not an institution. About 18% of the retrieved cases did not mention a suspect. If the policies we proposed had been implemented, these could all have been classified as non-domestic violence. Some 5% of the cases lacked a formally designated suspect but contained a description of a suspect. These cases could be classified as non- domestic violence. Another 14% of the cases retrieved by the triage system in 2006 could immediately be classified as non-domestic violence. They all contained one or more terms from the “ acts of violence” cluster and none from the “ persons” cluster. In sum, 514 of the 1157 cases retrieved by the triage system in 2007 could be correctly classified in an automated way when making use of the newly discovered knowledge. These findings are displayed in Table 9. Table 9. % of the 2006 cases classified automatically % retrieved cases classified automatically Current situation 0% Applying first 13 discovered rules from Table 8 44% Adding data field for suspect mentioned by victim to police registration form 54% The proposal that could be implemented to solve this issue involves a rather small change to the triage software: incorporating the first thirteen rules from Table 8 into the existing triage model. As a result, about 44% of the retrieved cases will be automatically classified correctly. Moreover, if an additional data field for the suspect mentioned by the victim is added to the police registration form, the fourteenth rule of Table 8 can also be integrated into the triage model. This would result in an automatic and correct classification of about 54% of the retrieved cases. An additional result is that a large number of the filed reports that were wrongly classified can now be automatically detected and corrected, the results of which are displayed in Table 10. 32 Table 10. Number of filed reports that were incorrectly classified, but corrected by means of the 13 rules Non-domestic corrected to Domestic Domestic corrected to Non-domestic Total Year 2006 135 110 245 Year 2007 124 88 212 First quarter 2008 54 24 78 Using the newly discovered rules, many of these incorrectly classified police reports can be automatically detected and reclassified. For example, for the year 2007, we found 212 filed police reports that were incorrectly classified. 7 Conclusions Domestic violence is one of the top priorities of the Amsterdam-Amstelland police. When a victim makes a statement to the police, police officers are given the possibility to indicate whether it is a domestic violence case. Still, this has proven to be problematic. The use of FCA , however, can play a significant role in overcoming some of the hurdles encountered when dealing with domestic violence cases. This paper specifically showcased the possibilities of using FCA for knowledge discovery from police reports. The FCA lattices prove to be very useful as knowledge browsers. The construction of an initial lattice containing term clusters created by a domain expert on the basis of the domestic violence definition and the incremental refinement of this lattice was shown to provide a powerful framework for exploring unstructured data. First, it was shown that the domestic violence definition is too vague, making it hard to use it effectively for classification purposes. Moreover, the scope of terms such as ex-partners and violence, was nowhere communicated in the definition. Second, we exposed that there exists a considerable amount of confusion amongst police officers about the nature and scope of domestic violence. Regularly occurring domestic violence situations such as incest or an ex-boyfriend attacking the new boyfriend of a girl were often not recognised as such by police officers. Third, using FCA, we were able to discover some essential characteristics that discriminate domestic from non- domestic violence reports. These characteristics include phrasings, words and word combinations that typically occur in either domestic or non-domestic violence cases. This newly discovered knowledge was then used to automatically assign a label to the cases retrieved by the in-place case triage system. It turned out to be possible to automatically and correctly 33 classify about 44% of the cases that used to be set aside for manual inspection. Moreover, a large part of the filed reports that were incorrectly classified, could be automatically detected and reclassified. Acknowledgements The authors would like to thank the police of Amsterdam-Amstelland for granting them the liberty to conduct and publish this research. In particular, we are most grateful to Deputy Police Chief Reinder Doeleman and Police Chief Hans Schönfeld for their continued support. Jonas Poelmans is aspirant of the Fonds Voor Wetenschappelijk Onderzoek – Vlaanderen or Research Foundation – Flanders. References [1] Office on Violence against Women (2007) About Domestic Violence (http:www.usdoj.gov/ovw/domviolence.htm). Retrieved on 2007-10-22 [2] Watts, C., Timmerman, C. (2002) Violence against women: global scope and magnitude. The Lancet 359 (9313): pp.1232-1237. RMID 1155557 [3] Waits, K. (1985). The criminal Justice System’ s response to Battering: Understanding the problem, forging the solutions. Washington Law Review 60: pp. 267-330 [4] Vincent, J.P., Jouriles, E.N. (2000) Domestic violence. Guidelines for research-informed practice. Jessica Kingsley Publishers Londen and Philadelphia [5] Black, C.M. (1999) Domestic violence: Findings from a new British Crime Survey self-completion questionnaire. London: Home Office Research Study. [6] Keus, R., Kruijff, M.S. (2000) Huiselijk geweld, draaiboek voor de aanpak. Directie Preventie, Jeugd en Sanctiebeleid van de Nederlandse justitie. [7] Yevtushenko, S.A. (2000). System of data analysis “ Concept Explorer.” Proceedings of the 7th national conference or Artificial Intelligence. KII-2000. 127-134, Russia [8] Ganter, B., Wille, R. (1999), Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg. [9] Wille, R. (1982), Restructuring lattice theory: an approach based on hierarchies of concepts, I. Rival (ed.). Ordered sets. Reidel, Dordrecht-Boston, 445-470. [10] Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2010), Formal Concept Analysis in knowledge discovery: a survey. Lecture Notes in Computer Science, 6208, 139-153, 18th international conference on conceptual structures (ICCS): from information to intelligence. 26 - 30 July, Kuching, Sarawak, Malaysia. Springer. [11] Wille, R. (2002), Why can concept lattices support knowledge discovery in databases?, Journal of Experimental & Theoretical Artificial Intelligence, 14: 2, 81-92. [12] Stumme, G., Wille, R., Wille, U. (1998), Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods, In: J.M. Zytkow, M. Quafofou (eds.): Principles of Data Mining and Knowledge Discovery, Proc. 2 nd European Symposium on PKDD ’ 98, LNAI 1510, Springer, Heidelberg, 1998, 450- 458. [13] Stumme, G. (2002) Efficient Data Mining Based on Formal Concept Analysis. Lecture Notes in Computer Science Vol. 2453, Springer, Heidelberg, 3-22 [14] Stumme, G. (2002), Formal Concept Analysis on its Way from Mathematics to Computer Science. Proc. 10th Intl. Conf. on Conceptual Structures (ICCS 2002). LNCS, Springer, Heidelberg 2002. [15] Priss, U. (2000), Lattice-based information Retrieval. Knowledge Organization, 27, 3, 132-142. [16] Godin, R., Gescei, J., Pichet, C. (1989), Design of browsing interface for information retrieval. In: N.J.Belkin, C.J. van Rijsbergen (Eds.), Proc. SIGIR ’ 89, 32-39. [17] Carpineto, C., Romano, G. (2005), Using concept lattices for text retrieval and mining. In Formal Concept Analysis-State of the Art, Proc. of the first International Conference on Formal Concept Analysis, Berlin, Springer. [18] Cole, R. , Eklund, P. (2001), Browsing Semi-structured Web Texts Using Formal Concept Analysis. In H. Delugach, G., Stumme (Eds.), Conceptual Structures: Broadening the Base, LNAI 2120, Berlin, Springer, 319-332. 34 [19] Eklund, P., Ducrou, J., Brawn, P. (2004), Concept Lattices for Information Visualization: Can Novice Read Line Diagrams? In P. Eklund (Ed.), Concept lattices: Second International Conference on Formal Concept Analysis, LNCS 2961, Berlin, Springer, 14-27. [20] Priss, U. (1997), A Graphical Interface for Document Retrieval Based on Formal Concept Analysis. In: E. Santos (Ed.), Proc. of the 8th Midwest Artificial Intelligence and Cognitive Science Conference. AAAI Technical Report CF-97-01, 66-70. [21] Beke, B.M.W.A., Bottenberg, M. (2003) De vele gezichten van huiselijk geweld. In opdracht van Programma Bureau Veilig / Gemeente Rotterdam. Uitgeverij SWP Amsterdam. [22] T. van Dijk, Huiselijk geweld, aard, omvang en hulpverlening (Ministerie van Justitie, Dienst Preventie, Jeugd-bescherming en Reclassering, oktober 1997). [23] Peirce, Ch. S. (1992), Reasoning and the logic of Things : The Cambridge Conferences lectures of 1898 edited by K. L. Ketner and H. Putman, Cambridge: Harvard University Press. [24] Arnauld, A., Nicole, P. (1985), la logique ou l’ Art de penser. Edition Gallimard . [25] Collier, P.M. (2006) Policing and the intelligent application of knowledge. Public money & management. Vol. 26, No. 2, pp. 109-116. [26] Collier, P.M., Edwards, J.S. and Shaw, D. (2004) Communicating knowledge about police performance. International Journal of Productivity & Performance Management. Vol. 53, No. 5, pp. 458-467 [27] Chen, H., Chung, W., Xu, J.J., Wang, G., Qin, Y., Chau, M. (2004) Crime data mining: a general frame- work and some examples. IEEE Computer, April 2004. [28] Ananyan, S. (2002) Crime Pattern Analysis Through Text Mining. Proceedings of the Tenth Americas Con- ference on Information Systems, New York, New York, August 2004. [29] Brachman, R., Anand, T. (1996) The process of knowledge discovery in databases: a human-centered ap- proach. In advances in knowledge discovery and data mining, ed. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy. AAAI/MIT Press [30] Brachman, R.J., Selfridge, P.G., Terveen, L.G., Altman, B., Borgida, A., Halper, F., Kirk, T., Lazar, A., Mc Guinnes, D.L. and Resnick, L.A. (1993) Integrated support for data archaeology. International Journal of In telligent and Cooperative Information Systems, 2:159-185. [31] Raaijmakers, S.A., Kraaij, W., Dietz, J.B. (2007) Automatische detectie van huiselijk geweld in processen- verbaal. TNO-rapport 34293. [32] Politie Amsterdam-Amstelland (2008) http://www.politie-amsterdam-amstelland.nl/get.cfm?id=86, Retrieved on 2008-02-22. [33] Smyth, P., Pregibon, D., Faloutsos, C. (2002) Data-driven evolution of data mining algorithms. Communica tions of the ACM, Vol. 45, no. 8. [34] Fayyad, U., Uthurusamy, R. (2002) Evolving data mining into solutions for insights. Communications of the ACM, Vol. 45, no. 8 [35] Cole, R.J. (2000) The management and visualization of document collections using Formal Concept Analy- sis. Ph. D. Thesis, Griffith University. [36] Christopher, A. (1965) A city is not a tree. Architectural Forum, Vol 122, No 1, April 1965, pp 58-62 (Part I) and Vol 122, No 2, May 1965, pp 58-62 (Part II) [37] Poelmans, J., Elzinga, P., Viaene, S., Dedene, G. (2008). An exploration into the power of formal concept analysis for domestic violence analysis, Lecture Notes in Computer Science, 5077, 404 – 416, Advances in Data Mining. Applications and Theoretical Aspects, 8th Industrial Conference (ICDM), Leipzig, Germany, July 16-18, 2008, Springer. [38]Poelmans, J., Elzinga, P., Viaene, S., Van Hulle, M. & Dedene G. (2009). Gaining insight in domestic violence with emergent self organizing maps, Expert systems with applications, 36, (9), 11864 – 11874. [39] Viaene S., De Hertogh S., Lutin L., Maandag A., den Hengst S., Doeleman R. (2009). Intelligence-led policing at the Amsterdam-Amstelland police department: operationalized business intelligence with an enterprise ambition. Intelligent systems in accounting, finance and management. 16 (4) : 279 -292.