Deep Learning

Illustration by Author.

If you started to use Pytorch and you built a convolutional neural network, you probably met errors about tensor dimensions. You became crazy looking on the internet, hoping to find someone that did the same error. But it can make lose time and even if you corrected the error, you could do the same mistake again if you don’t have a deep knowledge of what input and output shapes are. This guide will help you to understand the dimensions requested in functions, like torch.nn.Conv2d layer and torch.nn.linear layer, that have different input and output dimensions.


  1. Output shapes of nn.Conv2d …

A simple guide for interpreting what Convolutional Neural Network is learning using Pytorch

Illustration by Author

The convolutional neural network is a particular type of Artificial Neural Network, widely applied for image recognition. The success of this architecture began in 2015 when the ImageNet image classification challenge was won thanks to this approach.

As you probably know, these methods are very powerful and good in making predictions, but at the same time, they are hard to interpret. For this reason, they are also called black-box models.

There are surely available model-agnostic methods, like LIME and partial dependence plots, that can be applied in any model. But in this case, it makes more sense to apply interpretable…

Data Science

An easy way to understand the most contributing features for anomaly detection

Illustration by author

Isolation Forest is one of the most used techniques to detect anomalies in the data. It’s based on a “forest” of trees, where each isolation tree isolates anomalous observations from the rest of the data points. Despite its simplicity, speed and intuitiveness, there is a drawback. The lack of explanation. Why is a particular observation considered anomalous by the algorithm? How can the output be interpreted?

There are two possible interpretations, Global and Local. Global because the goal is to explain the model as a whole and understand what features have a more relevant role on the general model. On…

Web Scraping

How to extract table’s content with Octoparse and apply clustering analysis

Illustration by author

Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape websites, especially those web properties that may have terms and conditions against such actions.

Climate Change and Global warming are common words that you can find every day. And these concepts are all linked to something. 51 billion tons of greenhouse gasses. All these gasses are produced by all the countries of the world. You can find these data available on a Wikipedia page. The table is called list of countries by greenhouse gas emissions.

You are probably asking yourself why scraping data about Greenhouse…

Data Visualization, Opinion

Image by author

It’s the right moment to move forward to other tools to visualize your data. Do you know Matplotlib? Forget it. Maybe it can be easy to apply and doesn't occupy much memory, but it’s hard to observe the change of features over time from static graphs.

Casually, while I started to work for my internship in Data Science, I began to use a fabulous Python library. I never dreamt that something like this would be possible. It’s called Plotly Express. You can finally interact with the graphs. …

Machine Learning

Examples of visualizations on MNIST dataset using the library Plotly Express

Illustration by author

I am actually meeting this topic in my Data Science Internship. It’s not the first time. I have already seen it during university, but I never understood really well the reasons we should use this method. Most of the times it was explained in high theoretical terms without giving examples of applications in the real world. So, what is Principal Components analysis? Why should we utilize it? What are the advantages of this method?

The Principal Components Analysis is an unsupervised technique that reduces the dimensionality of the dataset while preserving as much variability as possible.

Import data and do both simple and multiple aggregations

Photo by John-Mark Smith on Unsplash.

When you work with data in Python, there is surely a library that will never leave your side: pandas. It’s a pretty powerful and intuitive open source library that provides data structures that are useful for dealing with high-dimensional datasets.

There are two principal data structures:

  • Series for one-dimensional arrays.
  • DataFrame for two-dimensional tables that contain rows and columns.

In this article, I will focus on the most useful functions that split the dataset into groups. Then you can compute statistics, such as average, standard deviation, maximum, minimum, and much more.

You’ll learn to utilize the apply, cut, groupby, and…

A first step to take before scraping a website using Python

Computer on desk
Computer on desk
Photo by Carl Heyerdahl on Unsplash.

I discovered web scraping while working towards my master’s degree in Data Science. It wasn’t one of my courses, but I helped a friend with a project about this topic in her study program. It was hard to understand what basics I needed to solve this enigma. At the same time, the more difficult I found the task, the more compelled I felt to solve the mystery.

What is web scraping? Look at the words. Web refers to a website, while scraping is about the extraction of data. By merging the two words, you can understand the real meaning: extracting…

Let’s apply Isolation Forest with scikit-learn using the Iris Dataset

Image of a red flower among yellow flowers
Image of a red flower among yellow flowers
Photo by Rupert Britton on Unsplash

Anomaly detection is the identification of rare observations with extreme values that differ drastically from the rest of the data points. These items are called outliers and need to be identified in order to be separated from the normal items. There can be many causes for these anomalous observations: variability of the data, errors obtained during the data collection, or something new and rare has happened. The last explanation is not an error as you usually expect.

Managing outliers is challenging because it’s usually not possible to understand if the issue is linked to the wrong gathering of data or…

Data Science, Machine Learning

An application on PHM08 Challenge Data Set provided by NASA

Illustration by author

The increasing amount of data together with technological improvements lead to significant changes in the strategies of machine maintenance. The possibility of monitoring machine’s conditions has arisen the Predictive Maintenance (PM). PM had evolved in the last decade and is characterized by the use of the machine’s historical time series data, collected through sensors. Using the available data, it’s possible to provide effective solutions with Machine Learning and Deep Learning approaches. Predictive Maintenance allows to minimize downtime and maximize equipment lifetime.

One critical part of PM is the prediction of the Remaining Useful Life (RUL). What is it? It’s the…

Eugenia Anello

Data Science student | Intern at Statwolf| I like to share the concepts I learn everyday|

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store