What is data science?

To gain insights into data through computation, statistics and visualisation.

The art of taking data - to be able to understand it - process it - visualise it - communicate it.

 

What is the process of data science

  • ASK an interesting question.

Sometimes we get questions after looking at the data. Hence explore the data.

  • EXPLORE the data

To explore what the data says to you. This can be achieved by putting data on many plots and then looking at the plots to find out what are they talking. More the plots the better. This is the skills you learn by doing it.

  • MODEL the data

The most fun part of the data science.

  • COMMUNICATE/VISUALISE the results

Communicating the answer of the question.

 

Note : the above process goes forward and backward. After visualising the result, it is normal to go one phase back and correct things.

ASK <—> EXPLORE <—> MODEL <—> VISUALISE

 

What does data scientist do?

Data scientists use their mathematical skills to find insights in large amount of data, These insights are then used to create data driven products or visualise them to communicate it effectively.
They get raw data from real world, process it to create a dataset. This dataset is then used for analysis or in predictive models.

Skills of data scientist

  • knows which questions to ask
  • can interpret the data well
  • understands the structure of data
  • data scientists often work in teams

A data scientist is

  • Software Engineer - hacking skills
  • Statistician - math and statistics knowledge. uses R.
  • Domain expert - expertise about problem domain

 

How do you know if you are a data scientist

  • know how to deal with data
  • use many data sources
  • understand how data was collected
  • understand what is important
  • use statistical models (more than Excel)
  • understand correlations (parameters that trend similarly)
  • think like a Bayesian and act like a frequentist
  • good communication skills (what does 60% probability even means? how can we validate the conclusion?)

 

Tools for Data Science

Jupyter notebook tutorial

 

More Notes

Median meaning

  • If 100 people are made to stand in increasing order of height, then the weight of middle person is the median.
    In case of odd number of people, the middle person’s weight is the median.
    In case of even number of people, the average weight of middle two person is the median.

Mean/Average

  • Sum of items divided by count.

Deviation and Standard deviation

 

Common Data structures

  • Series (pandas lib)
    its an array

  • Data Frames (pandas lib)
    think of a DataFrame as a group of Series that share an index.

Data science jobs involve obtaining data, cleaning it, analysing it and communicating actionable insights to decision makers in organisations.
The title ‘Data scientist’ is used for a wide range of jobs, which range from entry-level positions only requiring basic knowledge of databases and Excel, all the way to high-level roles which involve inventing new algorithms, working with very large data sets and undirected research into open-ended questions.
Entry level roles sometimes go by the name of ‘data analyst’ or ‘junior data scientist’.

Links

Bootcamps

Interesting Jupyter Notebooks

  • Melting ice from mountains is like user behavior generating data.
  • Data lakes, river are the pipelines that facilitate flow of data to data warehouse.
  • Ocean is the big data sitting in data warehouses.