Do you think Machine Learning can help
your business grow? Everything from Healthcare and Agriculture to TVs and
Smartphones is getting transformed with the advent of Artificial Intelligence.
But is it just the flavor of the month? How can machine learning empower you and me? provides the most powerful zero code machine learning
workbench, for everyone who isn’t a data science expert. Last year, an Element AI
report indicated that only 22,000 people have the right skills to create Machine
Learning systems. The workbench streamlines the machine learning process
into a simple step-by-step workflow, that doesn’t take much more than a few clicks.
Starting with data ingestion and EDA all the way to deploying the model to your
applications in a matter of minutes! The workbench injects AI into
traditional rule based models & we WhiteBox the Black Box.
Unlike most machine learning, we can show you why the algorithms came up with
their predictions! Without further ado, let me show you how you too can predict
the future, from your own data. I’m uploading a file from my computer, based
on a credit rating engine we built for one of the only profitable fintechs in
India. The Clutch.AI workbench instantly uploads a variety of real-time
and Static data. We’ve collected a thousand records from retailers applying
for loans, along with their repayment history, social profile, and cell phone
metadata. Rows from the spreadsheet are shown here, each column is an individual,
with traditional indicators like their credit rating known here in India the
CIBIL score, days they’ve been delinquent on previous loan repayments (also known
as DPD), online behavior that lets us know about their relationship status,
languages they speak, and cell phone metadata, that gives us features like
inbox spam, number of Wi-Fi points, they connect to, etc. We’re gonna discard the
Customer ID and phone number fields since they’re unlikely to contribute to
credit risk. This file includes about a thousand people, so I’ll be running this
on a single node, but we do provide the option to run on a cluster comprising of
multiple nodes for larger data sources As soon as the data is ingested we can
see histograms that show whether we’re dealing with continuous or categorical
variables. We also show in depth statistical analysis. This helps us get a
feel for the data, before actually seeing relationships and recognizing patterns
with Exploratory Data Analysis. The bar graphs and histograms we saw earlier are
only showing information about single variables. Radial Visualization shows us
how different variables impact the target variable AKA the output variable,
that should be predicted – in this case Loan Default. As an exploratory
tool, radial visualization arranges factors on a circle based on their
influence on the data. Each dot refers to one of the individuals who applied for a
loan from spreadsheet. The legend shows that defaulters are orange – those who
paid back the loan are blue. Since defaulters are spread throughout,
using a traditional rule based model would leave out a lot of credit worthy
individuals. In the top left corner of the cluster, there’s an individual with a
CIBIL credit score of 303, but they still paid back the loan on time. And
there’s someone in the bottom left with a CIBIL score of 714, who still hasn’t
repaid the loan. Thus outliers are also obvious with the Radial Visualization. We
also provide scatter plots and correlations, both of which show pairwise
relationships between two variables & can be shown simply by naming the
analysis. I’ll go ahead and name this merchant correlation, and as you can see
it’s created instantly! Did you know that Loan Defaulters tend to connect to less
Wi-Fi? According to this graph Defaulters also tend to have more inbox spam. Moving on to unsupervised models, let’s
start with a self-organizing map. Up top here, we can see advanced parameters to
tweak the model. In every instance, the default parameters have been set up to
make model creation easier! We can also read more about the
parameters and tweak them in further iterations. I’ve named this model Loan
SOM. As we can see here, Self Organizing Maps represent the same data
points in every box. In this case, each box includes all thousand loan
applicants, with colors ranging from blue to red, the higher the values get. We can
see that some of the youngest loan applicants in the bottom-left of the age
map also had the highest CIBIL credit score, and they were among the least likely to default on their loan. In fact, Applicants with the lowest CIBIL credit
score were much less likely to default on their loans than those we saw with
higher credit scores in the bottom right corner of the CIBIL and Loan Default
maps. Supervised models have been arranged in ascending order of
complexity, from Logistic Regression to Deep Neural Networks. Starting with
Logistic Regression, I’ll go ahead and set the target variable to be Loan
status, name the model, and I’ll optimize the parameters since I don’t want to
delve too much into the nitty-gritty here at first. As you can see, it gets
created instantly. Here in the confusion matrix of the Logistic Regression model,
we can see that the model was 80% accurate, with the test data and about
79% accurate with the training data that was used to train the Logistic
Regression model. Since I’d like to see a bit more accuracy before I go ahead and
deploy the model into pre production with an API, I’ll go ahead and see if a
random forest algorithm performs better on this data. As you can see, it’s just as
easy to build this model, especially since I’m opting for optimized
parameters, where we run an optimization algorithm on top of the data
to choose the best parameters. By the way, this is a completely
asynchronous platform, Users can easily move from one part of the platform to
the next, even if it takes time to come up with the model based on your data. As
you can see, the Random Forest algorithm does perform a bit better with this data.
The confusion matrix showed us the accuracy for predicting the test data
improved a couple points, to 82% Ensembling refers to combining two or more
analytical models to synthesize their results for improved accuracy. I’m gonna
go ahead and combine the Logistic Regression and Random Forest algorithms
we’ve built, and as you can see that ensemble is created instantly! And as it
turns out, the accuracy did indeed improve. Now that we do have improved
accuracy we can go ahead and download a PMML model that lets you download the
model and embed it in Java or Python. Remember I mentioned Clutch.AI is unique
because of how we WhiteBox the Black Box. All we need to do is make the explainer
true and choose the number of samples you want it to be based on, we can
explain every model we build including ensembles. And this is all you need to
do. I’ll pick the target variable, optimize parameters, and train the Gradient
Boosting Classifier model here. As you can see, it shows up in less than a
couple seconds, when we take a look at the confusion matrix we can see there
are no false negatives. The numbers we see here in red ought to be zero for
complete accuracy, but at least this model is erring on the side of caution,
so I’m deploying it to the server. As you can see, with just a couple clicks you
can deploy a REST API in your application. I’m sending the payload of
another loan applicant here. As we can see below, we can see what variables
contributed to this individual being labeled as a non-defaulter and how much
each variable contributed to this rating. Predictably his higher CIBIL score
contributed to this rating, but also his lack of dependents and Low inbox Spam did. Thank you for taking time to watch this
video! If you liked it please check out our website and follow us on
social media we’re active on Facebook, LinkedIn,
Twitter, AngelList, Instagram, and here on YouTube. We also have a weekly newsletter that
you can sign up for on our website. We send out emails every Friday
about AI, Machine Learning, and the latest trends in the world of Data Science.

Clutch.AI: Zero Code Machine Learning Workbench

One thought on “Clutch.AI: Zero Code Machine Learning Workbench

  • October 30, 2018 at 8:20 am

    What are the technology used in building this web app?


Leave a Reply

Your email address will not be published. Required fields are marked *