Skip to main content Link Menu Expand (external link) Left Arrow Right Arrow Document Search Copy Copied

Data Conversion

To sonify the data derived from the discourse analysis, we had to convert the words into numbers. To begin the process, we returned to the Excel spreadsheet where the research team was tracking the framing of data breaches as crises through the language and expressions used to describe the victims, perpetrators, the breach, and the company or organized implicated in the incident. To do this we assessed each category separately to devise a scoring framework.

Scoring Framework

We came up with this Scoring Framework informed by the insights generated through the discourse analysis.

Perpetrators refers to the intensity of language used to describe the individuals or entities deemed responsible for the data breach. Based on our research, we found that language ranged in intensity, with the use of the word ‘hacker’ evolving over time to be applied as shorthand to describe the breach as unauthorized and compromising.

  • 0 = Unknown
  • 1 = Neutral language
  • 2 = Demonstrates Intent (fraudster, fisher)
  • 3 = HACKER (clear cut, trope)
  • 4 = More criminal intent than the more generalized use of hacker
  • 5 = Larger nefarious criminality, more organized, identified group or individual

Company tracks whether a company was named or not. Because all the breaches we studied impacted mid-to-large size organizations, it was important for us to track when and how companies or organizations were named explicitly in sources. We chose to score when a company was named as a 2 knowing this would produce a sound with a higher pitch and render more audible those infrequent instances when an entity was named.

Company (named or not)

  • 1 = Not named (No)
  • 2 = Named (Yes)

Breach reflects the intensity of the language and expressions used to describe the data breach event. We observed how in the mid to late aughts, language was more passive and used inconsistently to describe the event. As breach discourses stabilized in later years, we noticed how the language used to describe the breach corresponded to its size or the sensitivity of the data. For large breaches or those involving sensitive data, the language used to describe the breach scored higher on the scale.

Breach (intensity of the language used to describe the breach)

  • 0 = Unknown
  • 1 = Passive language (exposed, unprotected, revealed)
  • 2 = Leaking (accessed directly)
  • 3 = Breach
  • 4 = Major breach
  • 5 = Catastrophe

Data represents the sensitivity of data compromised in a data breach. We devised the following scale to account for what we observed as the ways sensitivity was observed in the discourse analysis. We wanted to understand how sensitivity corresponded to other variables, like perpetrators and breach.

Data (sensitivity of data)

  • 0 = Unknown
  • 1 = Public already
  • 2 = Sensitive
  • 3 = Very sensitive

Risk considers the language used to describe the risk to those impacted by the data breach, such as consumers who accounts were compromised. We consistently documented instances when the language used to describe the risk was intense even when the risk was still yet unknown, and in these instances, we scored these findings in the mid-range to account for the intensity of the language and expressions used.

Risk (The specific language used to describe the risk)

  • 0 = Unknown
  • 1 = Low Risk (describes a possible or projected risk; the data is already publicly available)
  • 2 = Medium risk (a known risk is identified; language is intense, but the risk is unknown)
  • 3 = High risk (immediate concern and impact; a concrete risk is named)

Risk to company tracks whether a risk to the company was articulated or not. We felt this was important to discern in relation to other company data and the broader framing of the data breach as a security crisis. Similarly to the company named variable, we chose to score when the risk to a company was articulated as a 2 to produce a sound with a higher pitch and render more audible those instances when risk was ascribed to an organization.

Risk to Company

  • 1 = No
  • 2 = Yes

Although size is not represented in the framework because it was already recorded numerically in the Excel spreadsheet, this variable was used in the sonification and represents the number of accounts compromised in each of the data breach cases. If a source did not reveal this information, we in-put a ‘0’ value into the spreadsheet. Remarkably, we encountered this with only one case, the Twitch breach in 2021. Unlike the reporting of all the other cases that revealed the number of accounts compromised, the focus of the Twitch case was on the amount of data (1000GB) leaked. We were interested in tracking the size of the data breach to understand how size corresponded to the other variables.