A guide to tune hyperparameters of KNN with Grid Search and Random Search

Image for post
Image for post
Photo by Andrea Piacquadio on pexels

Table of content

  1. Introduction
  2. Prepare Data
  3. Create KNN Model
  4. Model Evaluation
  5. K-fold cross-validation
  6. Grid Search
  7. Random Search

Introduction

K-Nearest-Neighbors is a supervised and non-parametric technique used for classification and regression. Supervised because the data is already labelled and non-parametric due to the fact that there is no underlying assumption for data distribution. Differently from other models, it doesn’t need any training data for model generation. All the training data is used in the testing phase, making training faster and testing phase slower and costlier.

In KNN, K is the number of nearest neighbors. Deciding the “right” number is important, not only because the algorithm requires such parameter, but also because the appropriate number of nearest neighbors determines the performance of the model. …


Comparison with EMBOSS Needle Tools for Global and Local Alignments

Image for post
Image for post
Figure 1: Photo by geralt on pixabay

Biopython is an international association of developers of freely available Python tools for computational molecular biology. In particular, it allows to deal with Sequence Alignments, that are methods able to compare and detect similarities between Biological Sequences. Sequence similarity means that the sequences compared have similar or identical residues at the same positions of the alignment.

In this article, I’m going to focus on the Pairwise Alignment. My goal is to replicate the same results obtained with Biopython with respect to Emboss Needle, an efficient webserver that provides many tools for sequence alignment. When we need to handle a huge volume of data, it’s preferable to have a good knowledge of Biopython to obtain the output faster than manually doing it on a website. For prior knowledge, I suggest you read my previous article, in which I explained the concepts of Similarity, Identity, Global and Local Alignments in a simple way. These concepts will be useful while I try to apply them with Biopython. …


A complete guide to understanding the most used approaches for Time Series forecasting using R

Image for post
Image for post
Photo by Jordan Benton on Pexels

SARIMA and ARIMA are the most widely used approaches to time series forecasting. These models are useful to describe the autocorrelated data. The autocorrelation is a typical characteristic of time series, in which the measured values over time are correlated with other values in the series. Other relevant features in a time series are trend, seasonality, and cycle, topics that I have discussed in my latest article about Time Series Analysis.

For example, let’s consider the prices in Amazon’s items from 1996 to 2020 as our time series. Are the product’s prices correlated with each other over the time? What will happen if there is a change of price in Amazon’s items? Can it have an effect on future sales? These are all questions that can be answered using SARIMA and ARIMA models. Taking into account the dependence of these prices is important to obtain accurate sales forecasting, that plays an important role in reducing costs and improving customer services. …


A tool to compare a biological sequence with a many other sequences in a database

Image for post
Image for post
Figure 1: NCBI website

Database sequence similarity search has become a central tool in bioinformatics due to the fact that the amounts of sequence data are growing exponentially within the time[1]. It is done for various reasons, such as finding relationships between the query sequence and other sequences in the databases, understanding the likely function of a sequence, identifying regulatory elements, understanding genome evolution, or assisting in sequence assembly. Database search is based upon sequence alignment methods and is used in pairwise sequence comparison. Sequence alignment can be global (whole sequence alignment) or local (partial sequence alignment) and there are algorithms to find the optimal alignment given particular comparison criteria. …


A simple guide for basic concepts in Time Series

Image for post
Image for post
Photo by emilyprovansal on Flickr

Time series is a series of data points ordered in time, which is often the independent variable. The goal is to estimate how the sequence of observations will continue into the future. Mostly commonly, a time series is a sequence taken at successive equally spaced points in time. Time Series analysis can be useful in specific economic and social contexts.

Examples of time series are Daily IBM stock prices, Daily IBM stock prices, Quarterly sales results for Amazon and Annual Google profits. Anything that is observed sequentially over time is a time series [1].

In this article I’ll talk about the basic concepts we need to know about time series. My goal is to introduce fondumental aspects that are not intuitive and are essential to interpretate and model a time series. Every notion will be followed by an example obtained with R. I’ll use the fpp2 package, that contains several other packages including forecast and ggplot2. It also provides several datasets, that I will use for the plots. …


Working with the EMBOSS Needle Tools for Global and Local Sequence Alignment

Image for post
Image for post
Figure 1: Pairwise alignment using EMBOSS Needle (https://www.ebi.ac.uk/Tools/psa/)

Sequence alignment is one of the fundamental problems in computational biology and has numerous applications. It is a process of comparing and detecting similarities between biological sequences (protein or DNA sequences).What “similarities” are being detected will depend on the goals of the particular alignment process.

In this article, I will talk about pairwise sequence alignment. In particular I am going to explain two famous techniques, Global and Local Alignment, and use a web server, Emboss Needle, that implement these techniques. My goal is to make understand in a clear way concepts that seem abstract at first glance, but it will become more clear as soon as you apply them. …


These tools will allow you to READ all the world’s languages without efforts

Image for post
Image for post
Photo by Glenn Carstens-Peters on Unsplash

We all know Google Translator, free multilingual translation service developed by Google to translate text and websites from one language into another.

But sometimes it’s not enough, you need more precise websites like WordReference or context.reverso.net to translate words, that have more than one meaning or that changes depending on the preceding or successive word.

But aren’t you tired of translating manually one word at time? Is it possible to translate consecutive words without change every time page? Could we translate fastly phrases from pdf files?

Now it’s possible with the extensions available in Google Chrome and in other browsers.

Users can customize and improve their browsing experience by accessing the countless number of extensions and apps available on many browsers. …


Image for post
Image for post

What to see:

  1. Fondaco dei Tedeschi
Image for post
Image for post

Fondaco dei Tedeschi is a brand new luxury Shopping Center in the heart of Venice, near to the Rialto’s bridge. There once was the seat of German merchants (mercanti tedeschi in italian), that used it as warehouse and accomodation, for this reason the palace has been called “dei tedeschi”. The palace has a square plan and is articulated around a single courtyard. Inside the building there are arcaded 4 floors with round arches, in the borders there are stores and shops. If you don’t want to become poor, the most interesting thing is definitely the Terrace. …


Image for post
Image for post

Cosa vedere:

  1. Fondaco dei Tedeschi
Image for post
Image for post

Il fondaco dei tedeschi è un centro commerciale di lusso nel cuore di Venezia, proprio sopra al centralissimo ponte di Rialto. Una volta era la sede dei mercanti tedeschi che lo usavano come magazzino e alloggio, per questo motivo il palazzo sul Canal Grande fu chiamato “dei tedeschi”. Il palazzo è a pianta quadrata, ed è articolato intorno ad un unico cortile. All’interno ci sono 4 piani di logge ad arcate a tutto sesto, sui lati dei quali erano disposti i magazzini e le botteghe. La parte che interessa di più a chi non vuole diventare povero, è decisamente la terrazza. …


Image for post
Image for post

Cosa vedere

  1. Doss Trento
Image for post
Image for post

Doss Trento è una piccola collina che sorge sulla riva destra del fiume Adige nel quartiere di Piedicastello del capoluogo trentino. Il suo punto più elevato raggiunge i 309 m sopra il livello del mare ed è ricoperto da 8 ettari di foresta. Questo è stato un luogo molto importante fin dai tempi del Neolitico, dove sono stati ritrovati resti della presenza dell’uomo; sono presenti rovine di una basilica paleocristiana e durante la Grande Guerra ebbe un ruolo importante come baluardo difensivo eretto dagli austriaci in difesa di Trento.

Sul “Monte Verruca” sono presenti il mausoleo di Cesare Battisti, dove una galleria fotografica ne racconta la sua storia, il museo storico degli Alpini, in omaggio al corpo militare e le Gallerie, luogo dedicato al racconto e alla rappresentazione del passato, della storia e della memoria del Trentino. …

About

Eugenia Anello

I am a Data Science student and a Traveller enthusiast | I learn something new everyday | https://www.linkedin.com/in/eugenia-anello-545711146

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store