I have met the confusion matrix during my first year of a Data Science master's degree. The first time the professor had explained it, I couldn’t feel anything, but CONFUSION! For this reason, I want to explain in simple words the concepts beyond this matrix, that will be your partner every time you need to evaluate the performance of the model.
So, what is a Confusion Matrix? Why do we need it? Generally, It’s a tool that helps to understand if the model is working really well. Moreover, from it, you can derive many evaluation measures, such as accuracy, precision…
The machine learning methods often fail to model the data because they learn particular features of the training set, that are not present in the test set. So these features are not representative and we are in a situation of Overfitting. It happens when the model fits too much of the training data, but it’s not able to generalize in new samples. There are many ways to address this issue, regularization, selection of the best hyperparameters, and K fold cross validation. In this article, I focus on this last topic because I think it has a relevant role to have…
Pytorch is a Deep Learning framework developed by Facebook’s AI Research Lab in 2016. It’s widely famous for computer vision oriented applications. Moreover, It’s characterized by simplicity, strong GPU support, and implemented deep learning algorithms. For these features, it’s also one of the most used libraries in academic research.
There are a lot of Pytorch tutorials on the web and a lot of documentation on the Pytorch website. But too much information can be confusing and make lose a lot of time. My goal is to show an overview of the basic functions and classes available in Pytorch along with…
Speech recognition is an interesting task that allows you to improve the quality of your life. In this neverending Covid period, I need to watch many videos of lessons, and it’s so easy to lose concentration. At the same time, the possibility to have all registrations available on my university’s website made me become a perfectionist, so I would like to take every word in my notes. But it’s costly because it needs a lot of work and steals time.
Luckily, there are already API resources available such as Google, Amazon, IBM, and many others, that offer services that convert…
The Generalized Additive Models are extensions of the linear models that allow modeling nonlinear relationships in a flexible way. Moreover, GAMs are a middle way between simple models such as linear regression and more complex models like gradient boosting.
Linear models are easy to interpret, used for inference and allow to understand the linear relationship between the predictor and response, but can suffer from high bias. …
If you have windows 10 and you need to install Ubuntu for study or work purposes, you are in the right place. Have you tried to install Ubuntu VirtualBox in the past? If you had, probably your computer became very slow, it needed too many efforts to run it and you lost a lot of time.
Forget about it. There is a way to not fill all your computer’s memory and it’s very fast to use. You only need to install the Ubuntu shell from Microsoft Store and it’s done! Without any complicated steps.
The Ubuntu Shell can be installed…
Working with dates and times in Python is not automatic while analyzing the datasets for the first time. There are a lot of features to take into account, such as the year, the month, the day, the hour, the minutes, the seconds, but also more complex features as the duration, the weekday, the timezones. For this reason, I will talk about a Python module, that manipulates this type of information: datetime [1].
The datasets often have the dates represented as strings and you need to convert them to datetime format in order to work with time series data. …
Pytorch is a Python package that is used to develop deep learning models with maximum flexibility and speed. Pytorch is characterized by Tensors, which are essentially n-dimensional arrays and are used for matrix computations. So, it’s similar to a NumPy array. The advantage of using Pytorch Tensor instead of a Numpy array is that a PyTorch Tensor can run on GPU [1].
The Pytorch installation is not so hard itself, but the steps to enable GPU on the local machine are not banal. …
Social networks are constantly part of our life nowadays. Their popularity can be explained by accessibility and convenience, which allow users to provide huge amounts of information with limited or no restrictions on content. This continuous and rich mass of data is made available by these platforms with the purpose of studying sentiments about brands, products, events, recent news, social and political issues.
In this covid-19 period, there has been a dramatic growth on these platforms. In Twitter, there has been an increased use of the platform for misinformation related to the pandemic. For this reason, I am going to…
The goal of this post is to use a tool to train and evaluate a Pytorch’s model in a simple way. This tool is Skorch, that is a scikit-learn compatible neural network library that wraps Pytorch. So it makes possible to use Pytorch with sklearn. Moreover, it takes advantage of Scikit-learn’s functions such as fit, predict and GridSearch [1]. This tool is applied on MNIST, a dataset composed by images of handwritten digits: 60,000 for training and 10,000 for testing. …
I am a Data Science student and a Traveller enthusiast | I learn something new everyday | https://www.linkedin.com/in/eugenia-anello-545711146