Text Classifier — tutorial for marieai.com
Today, you will learn how to make ML classifiers having only a plain text documents and MarieAI.com access.
Hi. I’m Marcin, the creator of an alternative Machine Learning technique, named MarieAI — after Marie Curie, a polish scientist.
Today I want you to learn how simple it is to use alternative Machine Learning technologies without knowing maths, data science or programming. All you need is common sense and — maybe — some dataset. But don’t worry I got you covered!
1. What is a classifier?
Classifier is an algorithm that can take specific data (like a sentence) and associate it with a category it has learned previously from similar data.
For example, if you want to classify news into several categories, you simply collect texts for each category in one file.
a. “general news” — goes to news.txt
b. “food” — goes to food.txt
c. “lifestyle” — goes to lifestyle.txt
Pretty simple isn’t it? I told you, you only need common sense to user MarieAI.
2. Getting data set for our News Classifier
it wouldn't be a tutorial it I haven’t done this for you, so you can download this package with text files. Each have copy&paste text from news articles in specific categories:
https://cloud.marieai.com/tiles/tutorialNews.zip
You can collect data in a similar way, as you can see Machine Learning is not a rocket science!
3. Feeding Algorithm
Lucky you, we have just released test version — might be limited in the future (article published in September 2022) — but now, you can freely train your Classifiers with simple text files.
Visit: https://cloud.marieai.com/lib/marieMaker.html and “Click to Upload” your files (or the one you have downloaded from us).
4. Start Training
Once you have uploaded your files, you can see the following list:
As you have noticed, we have added “general news” file (news.txt), so the algorithm can distill domain knowledge from general knowledge — this is why I’ve put this Tip above the button. Now, click the button!
You can see process progressing.
Once done, you can download the zip package.
4. Testing in HTML
Once you have a ZIP package, you can “unzip” and look into the file structure. JS file contains function that classifies data while marieAImodel[x] is a very simple JSON object with weights
Give it a try without coding!
If you want to give it a try without coding, simply run a HTML file in your browser. You will see text field and some additional information:
What I did on the screen below — I simply copied and pasted some text I “DuckDuckGo’ed” in the internet about diet trends, to see how it works:
Most consumers associate the word “diet” with weight loss, but there are many other reasons to change eating habits aside from fitting into a smaller pair of jeans. COVID brought on the pursuit of building stronger immune systems through nutrition. In response to the pandemic, the World Health Organization (WHO) issued updated dietary guidelines with the goal of maintaining a strong immune system and minimizing chronic diseases.
Keep in mind, our dataset had only 101kb of articles about food.
For decent results, I recommend at least 1MB for niche topics or 10MB for mainstream topics (the more specific topic the better the results are).
5. Using JSONs
This section will be explained in the next tutorial.
About Me
Hi, I’m Marcin. Former game developer and algorithms designer.
Currently working on a novel, general purpose algorithm, you can use offline on any device. My dream is to deliver solution that could be trained, up trained and work as a swarm, solving insanely complex problems.
This is how MarieAI (https://marieai.com/) was born.
Concept of a Digital Hippocampus
Data Layering technique, explained here.
Find me on LinkedIn: https://www.linkedin.com/in/marcin-rybicki-qa/