Trustworthy Records and Open Data

Anne Catherine Thurston, International Records Management Trust

People assume that good economic data is there, but if it is not, work is flawed or not possible. Data should come from records – the veracity of the data depends upon the record. The quality of the records management system makes you trust or doubt data.1
Bill Dorotinsky, World Bank Sector Manager, Public Sector
and Institutional Reform, Europe and Central Asia/ Co-Leader, Global Expert Team

 

Our data and record-keeping crisis has always been with us.  There are a few things that are basic to development, growth and the very being of a nation - quality data gathering, storage and retrieval is one. Proper record keeping and archiving is another.  These are the soul, body and spirit of a nation. At 50, we have never had an accurate census, people die daily in Nigeria and they are literally cast into the earth unrecorded.  Thousands are born daily without records. No one in Nigeria can tell with certainty, how many policemen, soldiers or civil servants there are today in service.
Editorial, The Nation, Lagos, Data Disaster, August 1, 2011
http://www.thenationonlineng.net/2011/index.php/editorial/14224-data-disaster.html

 

Overview

The success of Open Government, in terms both of proactive disclosure (Open Data) and reactive disclosure (Freedom of Information/ Right to Information) rests ultimately on governments’ ability to create and maintain reliable, trustworthy and accurate information (records and data) and on people’s ability to access it. Public authorities need to know what information they hold, to be able to retrieve the information efficiently and to be accountable through this information.  Citizens and investors need to know that they can trust the information that governments provide. When datasets are released through Open Government portals, citizens have the right to expect that the data will be accurate and that privacy will be protected.  Similarly, citizens, journalists and others who make requests under access legislation have the right to expect that they will be provided with accurate and authentic information.  

There is a growing and widely held assumption that Open Data will provide the basis for openness in the future.  This paper suggests that real openness must ultimately build upon a foundation of reliable, high quality source records that document government policies, activities and transactions. While it is very valuable to have data flow freely on to the web, the reality is that most government information lies submerged as part of the larger iceberg below and that unless this information is managed, openness is limited and governments cannot be held accountable.  Alongside the enthusiasm for Open Data, therefore, it is important to take a deeper look at the factors that make information, both records and data, trustworthy.  There is an opportunity to make a more substantial contribution to transparency, accountability, anti-corruption and citizens’ rights and economic development by linking Open Data to accurate, reliable, trustworthy records.

This article identifies key records-related issues that threaten the success of Open Government and Open Data initiatives, and it highlights the value of the internationally agreed standards and solutions developed by the records profession2. It relates principally to Open Data, but many of the issues it raises are also critical for Freedom of Information/ Right to information, which remains vital as means of accessing the large areas of government information, for instance internal email communications, that are unlikely to be part of Open Data.

Issue One:  Significance of Well Managed Records for Reliable Data

The Open Data movement uses the power of technology and the Internet to proactively disclose government-generated information, both to catalyse openness and transparency and to foster economic development.  In both cases, the anticipated benefits are only possible if the records from which the data are derived are complete and have integrity.  Open Data should enable citizens to take greater ownership of and participate more fully in the life of public institutions by enabling citizens to monitor how government money is spent and to hold public officials accountable. For instance, when data about government spending, aid flows and geographical disbursement is mapped against other data such as crime statistics, school results or hospital infection rates, citizens can identify discrepancies in spending. At the same time, Open Data should enhance economic and social development, for instance by enabling research leading to product development, attracting investors and supporting industry in achieving a market advantage.  Inaccurate data will undermine these goals.

Early work on Open Data has focused on releasing datasets without a methodology for ensuring their accuracy and their traceability to reliable information sources. Government data relies heavily on evidence derived from official government records, and in many countries, public records are not managed in relation to international standards.  Often even basic records management controls are not in place, particularly in the digital environment. Without these controls, records are likely to be incomplete and difficult to locate and authenticate; where they exist, they can be easily manipulated, deleted, fragmented or lost.  Poorly kept records result in inaccurate or incomplete data, which can lead to misunderstanding and misuse of information, cover-up of fraud, skewed findings and statistics, misguided policy recommendations and misplaced funding, all with serious consequences for citizens’ lives.  

While data can be a valuable indicator, to be trusted it needs to be substantiated.  Can the records from which the data are derived be trusted? Are they complete? Are they authentic? How were they generated, by whom and in what conditions?  Is there sufficient contextual information to enable them to be understood?  Are they being captured and held securely to allow comparisons over time?

The degree to which Open Data initiatives will provide a means of holding governments to account and support economic development depends in large measure on the government’s ability and willingness to manage and make public records available. In the US and the UK, the availability of large datasets and their relatively high degree of reliability has made it possible to provide a growing volume of meaningful data to citizens. In the US, data.gov is a curated web portal that uses standardised metadata templates to facilitate search and retrieval of information by a wide range of interests, from individual citizens to research groups to large corporations operating at the international level.  In the UK, the Office of Public Sector Information, operating from within the National Archives, sets standards, delivers access and encourages the re-use of public sector information. There are still concerns about the quality, accuracy and integrity of the data, especially where vetting has not been carried out rigorously, but both countries have strong records management traditions that give them the ability to establish a significant degree of trustworthiness and hence to create and share relatively reliable datasets.  

As the high expectations for Open Data are translated into developing country contexts, questions need to be asked about the veracity of the data available and the relationship of the data to the records that should support it. In countries where governments do not address records management, citizens accessing data through Open Data portals may be provided with unstructured, uncontrolled data; they may receive partial, incomplete or misleading information. As the Open Data movement develops and expectations evolve, it will be important to do more than simply open datasets to the public. Citizens, investors and other users will want to know how decisions have been taken, who has taken what actions and why.

As an example, many countries have high percentages of ghost workers on their payrolls.  People who are not working are being paid from funds that should be used to provide services such as education and healthcare, and the people who are working may not be paid a living wage.  Payroll data, in these circumstances, is virtually meaningless.  Censuses of civil servants or teachers can provide a snapshot in time as a step toward control, but ultimately, control is achieved most effectively when verified employment records are linked to biometric data and mapped against the payroll to provide the evidence of accuracy. In Sierra Leone, for instance, it has been possible to reduce the Civil Service payroll by 14% by establishing control of pay and personnel records3. Once the necessary controls are in place and maintained, it is possible to provide meaningful payroll data for planning, management and community monitoring purposes.

Issue Two:  Significance of Context and Integrity

The Open Data approach has evolved from principles developed by the dataset community. The International Association for Social Science Information Services and Technology (IASSIST), professionals working with information technology and data services, has played a leading role in developing and implementing standards.  Its members have drawn on subject-based library cataloguing rules for cataloguing datasets.  There has been little involvement of records professionals, whose role is to support the production of accurate and reliable records and who use metadata to capture the context of how the records were created and used and their structure and management through time.  Put simply, books and published reports can be managed as individual items, but records must be managed in the context of the other records and information to which they relate if they are to serve their purpose of documenting official processes and actions.  A sequence of records may be needed to provide an audit trail for a transaction or decision.  

Context and traceability are the core elements for dataset authenticity and reliability. Once data has context wrapped around it, it becomes a record.  For instance, expenditure data becomes a basis for transparency and accountability when it is possible to demonstrate where it came from, who authorised it, who saw it and how the money was spent. If datasets are separated from the records from which they are derived, the context is lost. Making data available without context can compromise the value of the information and in some cases make it unusable.  When citizens doubt the reliability of the data, the goal of openness and trust in government are undermined.  When Open Data takes account of the nature and quality of the data sources, it becomes a powerful tool for audit, accountability and anti-corruption, and for business development.

Issue Three:  Challenges of the Digital Environment

Digital records are extremely fragile.  Their integrity depends upon a quickly changing array of hardware and software. Unless these records are carefully managed and protected, governments cannot guarantee their availability, authenticity, and usability over time and across sites, with the result that data will be incomplete and untraceable.  Governments often focus on the dramatic benefits of digital systems without recognising the challenges of ensuring the integrity of the digital information generated by these systems.

The risk is that in the digital environment, if records are not managed professionally, their availability and integrity, and their value as legal evidence and an authoritative source for Open Data initiatives, can be compromised. They may not remain accessible if they are not migrated to new software or refreshed to newer hardware environments, and they may not be linked to related paper records to provide complete information.  They may be held in multiple locations so that it is not possible to know which is the final or authoritative version; they may be stored on unmanaged network drives or on personal drives where they are unavailable as a corporate resource.   Digital records will not remain accessible unless they are captured and held in a safe, neutral and professionally managed repository and supported by complete metadata. A leading digital records expert estimates that there is a six-month half-life preservation opportunity: every six months there is a 50% decline in the likelihood of being able to preserve the reliability and accessibility of the record if there are any problems related to the transfer4.

Governments and international organisations often decide that digitisation will be a quick means of making records accessible and ending dependence on paper records.  However, digitisation initiatives often fail to put in place a management and quality control framework to ensure that the digitised records meet requirements for legal admissibility, reliability and authenticity, and to ensure safe migration of the records from one generation of technologies and formats to the next. For instance, agencies are often unaware of the requirements for image resolution, metadata fields, standardised indexes and classification structures, and retention and disposition schedules.  In some cases, government agencies assume that hard copy source records can be destroyed as soon as digital surrogates records are created.  However, if the scan is poor, the agency and civil society are at risk. Where the agencies keep the hardcopy source records ‘just in case’, there can be confusion about which records are ‘official’ or ‘original’. In some cases, organisations destroy the source records after scanning but create an audit trail to confirm traceability back to the source records.  However, reliable techniques for ‘traceability’ have not yet been established.

Issue Four:  Significance of Trusted Digital Repositories for Openness

Trusted digital repositories (TDRs) are an internationally accepted, technology-neutral means of ensuring long-term access to digital records and datasets as assets and protecting their integrity, completeness, trustworthiness and traceability. TDRs can be designed and managed professionally to capture and provide access to authentic data and digital records; link active and inactive datasets to hard copy or digital records that provide context; provide the means to interpret, interrelate and restructure data; and support migration to new software and hardware environments.  Where this is achieved, citizens can have confidence that the records and data have not been tampered with or otherwise compromised.  The TDR can be a transparency portal for public access. At present, there are few TDRs outside Europe, North America and Australia/ New Zealand.

National archives, with their statutory responsibility for protecting and preserving public sector records are, by international good practice, the appropriate home of TDRs. The National Archives of Norway is an outstanding example of what is possible.  Norway has introduced powerful legislation, records standards and well-defined metadata architectures to guide the process of on-going access to digital records and data. The goal is to ensure that the information provided is complete, accurate, timely, authoritative and above all trustworthy. Citizens expect requests for information to be answered rapidly and protest if they are not.  Recently, the National Archives has developed TDR smart phone-accessibility to enhance citizen access and government transparency. While specific to Norway, the approach could be adapted to support developing country requirements and resources.

In addition to the work in Norway and by national archives across Europe, North America and Australia/ New Zealand, there is significant international work underway on TDR accountability requirements.  For instance, minimum requirements for digital repositories have been defined to include policies, strategies and procedures relating to service levels, legal permissions, the ingest of digital assets, testing compliance and data quality, preservation, migration and access.  There is a significant body of information upon which the Open Data movement can draw5.

Issue Five:  The Need for a Community of Good Practice for Public Records and Open Data

Solutions developed and tested in North America, Europe and Australia/ New Zealand, endorsed by professional bodies, and articulated in standards can provide a pathway for developing the regulatory frameworks and skill sets needed to create the complete, accurate and trustworthy records upon which Open Government and Open Data depend.  It is the international norm that national archives should be the lead organisation for developing and implementing records management strategies, working in partnership with others, especially government ICT organisations, access to information offices and audit authorities.  They should, according to the International Council on Archives (ICA), facilitate policies, procedures, systems, standards and practices designed to assist records creators in creating and retaining authentic, reliable and preservable records.

The Archivist of the United States is working with the ICA Forum of National Archivists, chaired by the National Archivist of Canada, to engage national archivists across the world in supporting the objectives of open government. The US National Action Plan for the Open Government Partnership has opened the way for introducing good records management as a foundation for effective open government:

The backbone of a transparent and accountable government is strong records management that documents the decisions and actions of the Federal Government. The transition to digital information creates new opportunities for records management, but much of government still relies on out-dated systems and policies designed during a paper based world. To meet current challenges, the U.S. will:  Reform Records Management Policies and Practices Across the Executive Branch. … The initiative will seek a reformed, digital-era, government wide records management framework that promotes accountability and performance.

Failure to address the records issue will undermine the long-term success of Open Data data initiatives and more generally of Open Government.  However, if the Open Data and records management communities can find common ground in the Open Government environment, there is a significant opportunity to enhance openness internationally.  A valuable first step would be collaborative research on the risks involved in releasing untraceable data, for instance in relation to education and healthcare.  This could be followed by the development of a good practice methodology for testing data integrity and accuracy.    Ultimately, this collaboration can lay the foundation for trust and genuine openness.  

Footnotes

1 - Discussion June 2011

2 - Particularly:  

3 - International Records Management Trust project reports

4 - Discussion with Ken Thibodeau, June 2011.  Dr. Thibodeau directed the Center for Electronic Records at the US National Archives and Records Administration in the 1990s, a period when yearly transfers of electronic records increased more than 100 fold.

5 - These include, for instance: