The best way to learn is to do projects. Especially for self-learners who learned some programming and tools of data science and wondering what next. I suggest, start doing some projects of your own. That’s the best way of learning.
Also, for a beginner who has no industry experience, some practice projects can make a good portfolio. But for a beginner, it may be hard to think of a good project idea. This article may be helpful in that case. Here I am compiling some of the projects of different kinds starting from exploratory data analysis, machine learning, and, deep…
When I first learn to make the pairplots, I used them in every project for some time. I still use them a lot. Pair plots are several bivariate distributions in one plot and can be made using just one simple function. Even The most basic one is very useful in a data analytics project where there are several continuous variables. We know that a scatter plot is widely used to present the relationship between two continuous variables. Pair plot puts several scatter plots in one plot and also provides the distribution diagonally. …
Before I started this MS program, I was looking for course curricula of different Masters programs and trying to find reviews of other people to understand which program is suitable for me. Now, as I am almost done with my MS, I thought I should write a review to help other learners who are looking for an MS program in Data Science or Analytics.
Before I dive into the MS program, here is my background. I have a Bachelor's in Civil Engineering and a master's in Environmental Engineering. So, I am not from a computer science background. I did a…
Sentiment analysis is one of the very common natural language processing tasks. Businesses use sentiment analysis to understand social media comments, product reviews, and other text data efficiently. Tensorflow and Keras are amazing tools for that.
Tensorflow is arguably the most popular deep learning library. It uses a neural network behind the scene. The reason it is so popular is, it is really easy to use and works pretty fast. Without even knowing how a neural network works, you can run a neural network. Though it helps, if you know some basics about neural networks.
Tensorflow also has very good…
Performance evaluation is the most important part of machine learning in my opinion. Because machine learning itself has become pretty easy because of all the libraries and packages. Anyone can develop machine learning without knowing much about what is going on behind the scene. Then performance evaluation can be a challenge. How do you evaluate the performance of that machine learning model?
Softwares like Weka provides a lot of performance evaluation parameters automatically as you build the model. But in other tools like sklearn or R packages, performance evaluation parameters do not come automatically with the model. …
Geospatial data can be interesting. One interactive geospatial visualization provides a lot of information about the data and the area and more. Python has so many libraries. It is hard to know which one to use. For a geospatial visualization, I will use Folium. It is very easy to use and it has several styles as well to match your choice and requirement. Let’s start.
I used a Jupyter Notebook environment for this. If you are using a Jupyter Notebook, you need to install folium using the anaconda prompt using the following command:
conda install -c conda-forge folium=0.5.0 — yes
Feature selection is one of the most important parts of machine learning. In most datasets in the real world, there might be many features. But not all the features are necessary for a certain machine learning algorithm. Using too much unnecessary features may cause a lot of problems. The first one is definitely the computation cost. The unnecessarily big dataset will take an unnecessarily long time to run the algorithm. At the same time, it may cause an overfitting problem which is not expected at all.
There are several feature selection methods out there. I will demonstrate four popular feature…
Stochastic gradient descent is a widely used approach in machine learning and deep learning. This article explains stochastic gradient descent using a single perceptron, using the famous iris dataset. I am assuming that you already know the basics of gradient descent. If you need a refresher, please check out this linear regression tutorial which explains gradient descent with a simple machine learning problem.
Before diving into the stochastic gradient descent, let’s have an overview of regular gradient descent. Gradient descent is an iterative algorithm. Let’s put it in a simple example. As I mentioned, I will use a single perceptron:
Pandas library is a very popular python library for data analysis. Pandas library has so many functions. This article will discuss three very useful and widely used functions for data summarizing. I am trying to explain it with examples so we can use them to their full potential.
The three functions I am talking about today are count, value_count, and crosstab.
The count function is the simplest. The value_count can do a bit more and the crosstab function does even more complicated work with simple commands.
The famous Titanic dataset is used for this demonstration. …
Exploratory data analysis is unavoidable to understand any dataset. It includes data summarization, visualization, some statistical analysis, and predictive analysis. This article will focus on data storytelling or exploratory data analysis using R and different packages of R.
This article will cover:
2. Some Basic Statistics
3. Predictive Model
If you are a regular follower of my articles, you might have seen another exploratory data analysis project using the same dataset before in Python. Here is the link:
I am using the same dataset here for performing an exploratory data analysis in…