What exactly is data? The story of data.

The story of data

It was a hot summer day when Lalita began getting ready for her afternoon lectures. ‘Monday, 90°F’, she scribbled in her notebook.

At the first glance, ‘Monday, 90°F” looks ordinary. However, hidden inside it is something powerful. When you contrast it with other days, a pattern begins to emerge.

Monday – 90 °F

Tuesday – 85 °F

Wednesday – 87 °F

Thursday – 90 °F

Lalita also noticed that when temperature was high, she did not need her blazer.

So what exactly is data here?

Data is plural of the latin word ‘datum’ which means any piece of evidence, observation, fact, or record that can be analyzed to understand something.

It can be a number.

It can be a word.

It can be a photo.

It can be a sound.

It can be a location.

It can be a yes-or-no answer.

So, in simple terms, data is the raw material for understanding.

OpenStax defines data as anything we can analyze to compile higher-level insights, and a dataset as a collection of related data points grouped for reference or analysis .

However, data by itself is not wisdom.

A temperature of 87 °F is just a number. However, when we connect that number with the question, “Do I need a jacket at this temperature?”, it then becomes useful.

History of Data?

The first use of data dates as far back as 19,000 BC. Back then our Palaeolithic ancestors relied on a baboon tool called the Isango bone to perform basic calculations.

Our modern understanding of the word data came in 1954 when it was used to mean “transmissable and storable computer information”.

Data begins with observation. Before computers, AI etc; data began with people noticing things.

A farmer noticed crops grow better after rain.

A trader noticed customers purchase more before festivals.

A doctor noticed certain symptoms appear together.

A teacher noticed, students performed poorly after one missed topic.

Each observation became a small piece of data. At first, data was memory, then it became records, then it became tables, then it became databases.

Today, data is everywhere; hospitals, banks, smartphones, websites, social media etc. However, the heart of data has still not changed. It begins when something happens and someone records it.

Two simple faces of data

Let us return to Lalita’s story. There are two types of data in the story. 90 °F is numeric data. However, ‘is the blazer needed?’ can only be answered in ‘yes’ or ‘no’ and that is known as categorical data. We describe a category with it, not a measurement.

Numeric data asks:

How much?

How many?

How high?

How long?

Catergorical data, on the other hand, asks:

Which type?

What group?

Yes or no?

Temperature is numeric data because it can be measured and “need blazer or not” is categorical data because it is represented by “yes or no”.

Data is not information

Data is raw. Information is processed. A list of temperatures is data. The insights “cold days require jackets” is information. A list of patient symptoms is data. However, the conclusion that the patient is at high risk is information.

Data becomes valuable only when we ask questions, organize it, analyze it, and explain what it means. That journey is the beginning of data science.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *