Breaking CAPTCHAs using machine learning

Everyone hates CAPTCHAs — those annoying images that contain text you have to type in before you can access a website. CAPTCHAs were designed to prevent computers from automatically filling out forms by verifying that you are a real person. But with the rise of deep learning and computer vision, they can now often be defeated easily. So let’s get started.

Read More

Text Classification using machine learning

Text classification is one of the important task that can be done using machine learning algorithm, here in this blog post i am going to share how i started with the baseline model, then tried different models to improve the accuracy and finally settled down to the best model. The goal here is to improve the category classification performance for a set of text posts. The evaluation metric is the macro F1 score.

Read More

Top 100 Data science interview questions

Data science, also known as data-driven decision, is an interdisciplinery field about scientific methods, process and systems to extract knowledge from data in various forms, and take descision based on this knowledge. A data scientist should not only be evaluated only on his/her knowledge on machine learning, but he/she should also have good expertise on statistics. I will try to start from very basics of data science and then slowly move to expert level. So let’s get started.

Read More

Kibana Timelion for Time Series Analysis

Kibana is very popular nowdays to visualize the Elastic search data but one aspect that Kibana falls short is in time series analysis and visualization. This is precisely where Timelion comes into picture.

Read More

Elasticsearch tutorial for beginners using Python

This tutorial is for the beginers who want to learn Elasticsearch from the scratch. In this tutorial i am gonna cover all the basic and advace stuff related to the Elasticsearch. So let’s get started.

Elasticsearch:-

Elasticsearch is a real-time distributed search and analytics engine. It allows you to explore your data at a speed and at a scale never before possible. It is used for full text search, structured search, analytics and all three in combination. Elastic search is an open source search engine built on top of Apache Lucecne, a full text search engine library.

Read More

Twitter Sentiment Analysis

Learning new things are always exciting. Today i will be sharing about Twitter sentiment analysis but first we need to get old data. Using Twitter API it is not possible to get older tweets. To get older data i am using the I’m using Jefferson utility. Clone this repository in your local machine and run below command to get 6000 tweets from 1-12-2015 to 2-12-2015 of #ChannaiFloods.

Read More

Create Beautiful, Interactive data visualizations using Plotly in Python

I have been using ggplot to plot in python but the limitation of ggplot is that it is not much interactive. Then my exploration started and i found D3.js a good alternative to plot interactive graphs. D3.js is not much famous in Data science community because it reqires knowleedge of jave script and css. Today, I am going to tell you something which will change the way you perform data visualizations in the language / tool of your choice (R, Python, MATLAB, Perl, Julia, Arduino).

Read More