Introduction to Data Science, What is it and Why now.

By: Claudia Flautero
November 09, 2020

In this blog you will find helpful answers about what is data science and why it has become so popular in the last years.

What is Data Science?

Data science is a set of methodologies used to intake thousands of forms of data that are available to the public today, and use them for analysis and to create meaningful conclusions.

Data is collected everywhere, and can be used with different purposes such as:

  • Describe the current state of things, organizations or a determined process.
  • Diagnose the cause of determined behaviors.
  • Detect events that can be described as anomalous or different from what the normal process is, like irregular credit car purchases.
  • Predict future events.

Everyday, around the world, data is being collected more than ever. Every transaction a person makes is connected through their email address to their social media. Also, their preferences, past or upcoming vacation destinations, and what they can afford or not are equally recorded. All these pieces of information are available and valuable to companies and governments.

What happens after all this data is collected?

After the raw data is collected and put together, it is important to do an analysis to extract valuable information from it. The raw data is important, yes, but the multiple analysis and conclusions that can be drawn from there are far more interesting than just having all the data without processing.

The process of the workflow for data science is:

  1. Collecting the raw data. There are several number of different ways that Data can be collected.

  2. Cleaning the raw data. It is important that all the information collected is processed to eliminate duplicates or find missing data for example.

  3. Analysis, exploration and visualization. Here the data is analyzed, which could involve creating graphics to observe trends, or comparing different sets of data to find answers to an specific questions, or trends in the population.

  4. Experimentation and use for prediction. Once the data is analyzed and information is extracted from it, it can be used to build a system that predicts future events, like trends in fashion, or music genres that are becoming more popular, even the development and evolution of sickness and syndromes. All that insight about the data is important since it can be used for purposes like preventing wildfires or stopping a disease at a key point to prevent further adverse events on a patient.

Applications of Data Science

Data Science can be applied to many different real life problems.

Three main areas of Data Science are:

  • Traditional machine learning:

For machine learning it is important to have a well-defined question, a set of example data that will provide enough information to build an algorithm and a new set of data to use an algorithm on.

Machine learning can be used to build algorithms that can help with different situations like predicting credit card fraud or identity theft, when (in a line of time) a part of a machine is going to malfunction, among many others uses.

  • Internet of Things (IoT):

IoT refers to gadgets that are not computers but are able to collect and transfer data, so they are combined with Data Science. Some of these gadgets can be a security home system, a smartwatch, the GPS of a car, street cameras, and so many more, making them a great resource for Data Science.

  • Deep learning

Deep learning is a sub-field of machine learning. In deep learning there are multiple layers working with different algorithms called neurons, and all the neurons are working together to draw important information from the data. Deep learning needs more data than traditional machine learning models, but it also learns to build relationships that the traditional model cannot. Deep learning is used in more complex situations, like language learning and differentiation, identification and classification of images and others.

Some extra information:

The workflow of Data Science is generally not fulfilled by one person. Teams are formed in order to have someone working on different steps of the process that will allow the implementation of different skills to develop the project. For example someone who collects data might be proficient in languages like SQL, Java or Scala, while someone that works with machine learning and deep learning has to be proficient in languages like Python or R and the machine learning libraries that are used to create complex algorithms, like TensorFlow. Other valuable skills are understanding basic statistics, working with simple spreadsheets to create simple visualizations for people outside the Data Science scope, and also Business Intelligence tools to create dashboards and visualizations.