From Excel to Data Science in 30 minutes (or less).

Marcin Rybicki
5 min readOct 23, 2022
It’s not called Human Learning, right?

AI means Machines are Learning. When humans learn, it is called statistics!

Let Machines do the Learning!

Maybe you never wanted to be a Data Scientist but you surely want to make money like one. With Unsupervised Learning, it’s easy. It is a technique where models train themselves and no supervision is required — hence the name.

This article can help you to build models like this one:
https://marieai.com/store/?itemID=hotelsContextOne

Let humans choose what to learn!

You need, however, to provide a good dataset with relevant informations. Through Data Engineering (collecting and cleaning data) and let Data Science be done by machines (algorithms). They are capable of spoting patterns and anomalies hidden in data. Just sit down and collect data.
Using this technique, your models can compete with the one made by PhDs in Machine Learning!

How to start?

Some time ago I invented and developed a novel AI algorithm called MarieAI — named after Marie Curie — that allows you to use simple data sets — the one you can prepare in Excel — and train Machine Learning models with no prior knowledge. Marie AI works in unsupervised fashion, building models (blocks) based on provided text files. You can use this blocks (called Tiles) to create sophisticated models.

MarieAI Tiles — building blocks for smart AI

So, how to start? Prepare your data in Google Sheet or Excel. But read what you can do with it first.

What’s possible?

Using this novel technique, you can train classifiers, recommenders and forecasting models. You can use generated models on your server, users’ browser or in mobile app. It’s because they arein JSON files.
Models are small, so single analysis takes just a few milliseconds. And since it can work in a browser, processing power can be distributed to millions of devices.
That’s what is possible in general but today I’ll show you how to use MarieAI to train a context detection model for hotels’ reviews.

Train context classifier with just text files

What’s required?

Let’s make a model, detecting when someone is talking about hotels.
For this task we need:
1. a file with some context related to hotels,
2. some other files to provide general context (so algorithm can see the difference).
You can find samples here: https://drive.google.com/drive/folders/1_1pu6VnJtWKuKS0ccsbFCGfacato2LVU?usp=sharing

Ad1. For this example I have prepared a list of — several thousands — hotels reviews in english and created a dataset like the one here [link].
Ad2. To provide opposite context and give MarieAI broader context to sieve through, I have prepared three other files:
ecommerce reviews (basically copy and paste of several thousands products reviews from ecommerce websites)
entertainment news — articles from news websites in category entertainment
lifestyle news — similar process to the “entertainment” file.

Samples: https://drive.google.com/drive/folders/1_1pu6VnJtWKuKS0ccsbFCGfacato2LVU?usp=sharing
Yes — you only need several text files with separate topics (and plenty of samples).

How much data is required?

I recommend having at least five thousand entries for each category. The more the better. Also very broad topics might require twenty thousand or more. Just keep that in mind — there is an abundance of content on the internet, so you can surely find what you need.

The broader you go, the more samples you need

Easy steps

Assuming you have some separated datasets that look like that:

All you have to do is to:
1. Upload them all on marieAI and wait a little bit,
2. Download completed package, unzip and run in browser — yes model works without external computing cloud — it’s cheap and easy to use,
3. Open a model testing page (attached html document) and
4. Test it, to evaluate your dataset on external examples (other than the one you included in your set)

When you are satisfied with the results, send your js files to a developer to use with your data. If you are a developer, simply use MRIscoreSamples(string) class to receive results on the fly — look at the html example and you’ll figure it out.

Wait, there is a bonus!

As you have noticed, I have provided four different topics. So I have not only hotels’ context classifiers but it also works as a classifier of all the remaining topics. In practice we have hotels as well as Ecommerce products classifier with two other topics that are quite broad and work mostly to provide counter-context. But you can create a batch of several topics. This is how I’ve made language detection you can find here

With many different samples you have an extended classifier Tile

Click here to test language detections.

Is it free?

You can train your models for free. All trained models are free to use in both personal and commercial cases. So yes, it’s completely free!

Why? You might ask. I’m trying to convince people to ditch old neural network based frameworks and start working with more intuitive, modern solutions.

Where can I use it?

You can use them in browser, on-device and on a server.
Models don’t require computing cloud, or external connection. Everything is done within js file with attached weights. It’s also a great way to work with sensitive data. Once you train a model you can analyse sensitive or confidential information on your device without sending it to the cloud.

What’s the operating cost?

It’s literally the cheapest and most green way to use AI today. When you deploy models on user’s devices (ie. in the app) you don’t even have to worry about processing power of your servers — no request is required to perform classification. If’ classifier’s task takes several miliseconds on user’s device you don’t have to pay for server computing time. Society don’t also have to support costs of infrastructure required to send all requests into the cloud and back.

Can I use AI in a browser without an AI cloud?

Yes, models are usually small enough to include them in your javascript sources and perform the whole classification procedure locally in the browser — either desktop or mobile. You literally have zero infrastructure costs related to classification processes.

Want to know more? MarieAI project: https://marieai.com/

About Me

Hi, I’m Marcin. Former game developer and algorithms designer.
Currently working on a novel, general purpose algorithm, you can use offline on any device. My dream is to deliver solution that could be trained, up trained and work as a swarm, solving insanely complex problems.

This is how MarieAI (https://marieai.com/) was born.
Concept of a Digital Hippocampus
Data Layering technique, explained here.

Find me on LinkedIn: https://www.linkedin.com/in/marcin-rybicki-qa/

--

--