Tutorial

Configure Ubuntu shell as terminal in Pycharm environment

Image for post
Image for post
Image made by Author

If you have windows 10 and you need to install Ubuntu for study or work purposes, you are in the right place. Have you tried to install Ubuntu VirtualBox in the past? If you had, probably your computer became very slow, it needed too many efforts to run it and you lost a lot of time.

Forget about it. There is a way to not fill all your computer’s memory and it’s very fast to use. You only need to install the Ubuntu terminal from Microsoft Store and it’s done! Without any complicated steps.

Bash on Windows provides a Windows subsystem and Ubuntu Linux runs atop it. It is a complete Linux system inside Windows 10. Basically, it allows you to run the same bash shell that you find on Linux. This way you can run Linux commands inside Windows without installing a virtual machine or dual boot Linux and Windows [1]. …

Programming

Image for post
Image for post
Photo by Rodolfo Barretto on Unsplash

Dealing with dates and times in Python can be messy while analyzing the datasets. There are a lot of informations to take into account, such as the year, the month, the day, the hour, the minutes, the seconds, but also more complex features as the duration, the weekday, the timezones. For this reason, I will talk about a Python module, that manipulates this type of information: datetime [1].

The datasets often have the dates represented as strings and you need to convert them to datetime format in order to work with time series data. …

Deep Learning

Configure a Conda environment in Pycharm to enable the use of CUDA

Image for post
Image for post
Image made by Author

Pytorch is a Python package that is used as a deep learning research platform that provides maximum flexibility and speed. Pytorch provides Tensors, that are basically the same as NumPy array: generic n-dimensional arrays used for arbitrary numeric computation. The biggest difference between a NumPy array and a PyTorch Tensor is that a PyTorch Tensor can run on either CPU or GPU [1].

The Pytorch installation is not so hard itself, but the steps to enable GPU on the local machine are not banal. …

Natural Language Processing

A simple guide to build a speech recognizer using Google’s API

Image for post
Image for post
Figure 1: Photo on Pixabay

Speech recognition is an interesting task that allows you to improve the quality of your life. In this neverending Covid period, I need to watch many videos of lessons, and it’s so easy to lose concentration. At the same time, the possibility to have all registrations available on my university’s website made me become a perfectionist, so I would like to take every word in my notes. But it’s costly because it needs a lot of work and steals time.

Luckily, there are already API resources available such as Google, Amazon, IBM, and many others, that offer services that convert audio into text. In this article, I’ll focus only on the Google Speech-to-Text API, which I think it’s the most efficient application to transcribe many videos. …

Web Scraping

How to extract data from Medium Search results

Image for post
Image for post
photo by Nathan da Silva on Unsplash

Web scraping is the process to extract data from websites. There are many use cases. We can apply it to scrape posts from social networks, products from Amazon, apartments from Airbnb or Medium posts as I will show.

Medium is a platform, where people can bring new ideas to the surface, spread them and learn new things every day. When I search one topic, there are a lot of articles as results and I would like to use web scraping to get the details of each of the Medium stories. …

Programming

How to use Python and Twitter API to create your own Twitter dataset

Image for post
Image for post
Figure 1: photo by JillWellington on pixabay

Social networks are constantly part of our life nowadays. Their popularity can be explained by accessibility and convenience, which allow users to provide huge amounts of information with limited or no restrictions on content. This continuous and rich mass of data is made available by these platforms with the purpose of studying sentiments about brands, products, events, recent news, social and political issues.

In this covid-19 period, there has been a dramatic growth on these platforms. In Twitter, there has been an increased use of the platform for misinformation related to the pandemic. For this reason, I am going to collect tweets from the last seven days that mention Corona Virus and Giuseppe Conte, that is the Prime Minister in Italy. …

A library that combines the flexibility of scikit-learn and the power of PyTorch

Image for post
Image for post
Image made by Author

The goal of this post is to use a tool to train and evaluate a Pytorch’s model in a simple way. This tool is Skorch, that is a scikit-learn compatible neural network library that wraps Pytorch. So it makes possible to use Pytorch with sklearn. Moreover, it takes advantage of Scikit-learn’s functions such as fit, predict and GridSearch [1]. The dataset used is MNIST, that contains 70,000 images of handwritten digits: 60,000 for training and 10,000 for testing. The images are greyscaled, 28x28 pixels, centered to reduce preprocessing and get started quicker. …

Google’s research company solved a 50 years old problem of Protein Folding

Image for post
Image for post
Figure 1: AlphaFold: a solution to a 50-year-old grand challenge in biology. Photo Credit: Deep Mind. Source: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

On 30 November 2020, the AlphaFold team from Google’s UK lab and research company DeepMind announced the most incredible news. It has been found the solution for protein folding problem, a challenge that needed half of a century to be solved. The latest version of protein structure prediction system AlphaFold, called AlphaFold2, took part in the 14th edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition. In this competition, it reached a very high level of accuracy to predict the 3D shape a protein will form based on its sequence of amino acids. The performance of AlphaFold2 increased a lot in just two years. …

A guide to tune hyperparameters of KNN with Grid Search and Random Search

Image for post
Image for post
Photo by Andrea Piacquadio on pexels

Table of content

  1. Introduction
  2. Prepare Data
  3. Create KNN Model
  4. Model Evaluation
  5. K-fold cross-validation
  6. Grid Search
  7. Random Search

Introduction

K-Nearest-Neighbors is a supervised and non-parametric technique used for classification and regression. Supervised because the data is already labelled and non-parametric due to the fact that there is no underlying assumption for data distribution. Differently from other models, it doesn’t need any training data for model generation. All the training data is used in the testing phase, making training faster and testing phase slower and costlier.

In KNN, K is the number of nearest neighbors. Deciding the “right” number is important, not only because the algorithm requires such parameter, but also because the appropriate number of nearest neighbors determines the performance of the model. …

Comparison with EMBOSS Needle Tools for Global and Local Alignments

Image for post
Image for post
Figure 1: Photo by geralt on pixabay

Biopython is an international association of developers of freely available Python tools for computational molecular biology. In particular, it allows to deal with Sequence Alignments, that are methods able to compare and detect similarities between Biological Sequences. Sequence similarity means that the sequences compared have similar or identical residues at the same positions of the alignment.

In this article, I’m going to focus on the Pairwise Alignment. My goal is to replicate the same results obtained with Biopython with respect to Emboss Needle, an efficient webserver that provides many tools for sequence alignment. When we need to handle a huge volume of data, it’s preferable to have a good knowledge of Biopython to obtain the output faster than manually doing it on a website. For prior knowledge, I suggest you read my previous article, in which I explained the concepts of Similarity, Identity, Global and Local Alignments in a simple way. These concepts will be useful while I try to apply them with Biopython. …

About

Eugenia Anello

I am a Data Science student and a Traveller enthusiast | I learn something new everyday | https://www.linkedin.com/in/eugenia-anello-545711146

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store