Tech Blog

Neil’s Process of Personalization

Introduction

As the internet becomes more saturated with unreliable information, trustworthy news articles are becoming scarcer and more difficult to extract. Searching the Internet can be cumbersome and is often times, hard to recognize concise articles covering specific topics from multiple, reputable sources.

Neil newsreader is a teachable; it is a personal curating service that offers hyper-curated content to the avid reader. Using Artificial Intelligence (AI) and Artificial Emotional Intelligence (AEI), Neil delivers reliable articles based on the user’s specific interests.  It is able to remember what you like and dislike by bookmarking favored reads. Neil also excludes unwanted contents unrelated to searched topics and genres in its recommendation system.

So how does Neil filter out convoluted information and deliver trustworthy news?  Let’s see how Neil’s technology works to provide customized information to its users.

Neil’s Personalization Process

Neil is trained by search words. When a user inputs keywords, topics, or genres, it scans thousands of RSS sources that have been manually selected. In order to find trustworthy contents within the scope of a user’s interests, the RSS feeds on screened sources are stored in the Neil server and removes unreliable SNS sources and other unrelated websites.  The more searches applied, the more personalized it becomes.

The Extraction of Tokens

 Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements, called Tokens.

The corpus text, which is pure text without computer tags, is collected from each content by using special program libraries for the selected language.  Currently, Neil Publisher is available in English, Korean and French. The corpus is then analyzed and sorted into three tokens: nouns, verbs, and adjectives. Desired tokens are then extracted while the remaining tokens are screened as unwanted and/or repetitive, which are ‘Stop Words’. Finally, the remaining tokens are combined to summarize the final contents of the article.

Vocabulary is a set of selected tokens gathered from millions of news articles through the above process. Once tokens are collected in the vocabulary, the grouped tokens from a document are turned into a vector that encompasses the entire article.

How Article Vectors Are Created

 Article Vectors represent the features of its own articles because one article vector is composed of a combination of ten key content tokens, key title tokens, and the domain address. Most vectors are made up of 13-15 tokens.

The extracted tokens from an article are presented in binary sign: 0 or 1. Similarly, the key tokens from the title are listed in the same way. When the occurrences of extracted tokens are combined, each vector token has one of three occurrence numbers: 0,1 or 2. Once an article vector is set, it is permanently fixed. The article matrix is expanded every two hours as updated news is being refreshed. Below is an illustration of an article vector.  

article-vector-process
Figure 1 – Article Vector Process

Figure 1 – Article Vector Process

As shown in Figure 1, Article 2 starts with the title “Dog people love travels guided by animals with the joy of snacks”.

The word tokens: ‘people, dog, travel, animal, snack’ are extracted from the title and form the article content. The Article 2’s vector has two occurrences in each token except for ‘blood’.

How Personal Profile Vectors Are Created

Each personal profile vector corresponds to the user’s preferences, interests, and genre tastes. Once a keyword is searched, it is stored and receives five credits in the user’s profile vector. Moreover, when an article gets a like by a user, all tokens matching in the personal profile vector tally one positive credit. To the contrary, one dislike gains a negative score in the profile vector. When a reader’s profile vector is examined, Neil can easily recognize the individual’s likes and dislikes. The higher number of credits equates to the bigger interests in the keyword tokens. Neil stores in your personal profile what you liked and disliked by bookmarking the favored articles while also remembering the ones you didn’t like. This is how Neil is able to deliver up-to-date articles curated by a user’s preferences.

figure2-token-value
Figure 2 – Token Values for Likes and Dislikes

Figure 1 – Article Vector Process

As shown in Figure 2, in the case of User 2, the highest credit is 10 in the tokens of both “dog” and “travel” so the article including the keywords of those are highly recommended in Neil’s system.

To the contrary, “blood” demonstrates the biggest score in the negative direction. We can assess that User 2 to dislike “blood” and any related articles to this topic and would most likely prefer not to see movies related to blood.

Neil’s Recommendation System:
How Neil Achieves Personalized and Curated Articles for Users

Neil’s system focuses on personalization and customization with users tastes. Neil imports the main information of users’ preferences from Facebook & Twitter APIs. From this, Neil is continuously trained by users’ searched words. Neil’s News recommendation system matches to your profile and continuously sends articles of interest. This is based on the Principle of Cosine Similarity.

The Principal of Cosine Similarity— shown in Figure 3 — is the multiplication of two documents which have directions. The multiplication of a personal profile vector and an article vector shows how similar these documents are. It demonstrates how the two documents are related by looking at the angle instead of magnitude.

figure-3-principal
Figure 3 – The Principal of Cosine Similarity

As a result, Neil tracks both positive and negative tokens and tastes in your profile vector so Neil gradually gets to know your news preferences. The more you use Neil, the better it can evaluate your likes and dislikes. You are then presented with a historical profile based on your news lists that you have personalized and curated.

Principle Applications to Neil

As shown in Figure 4, User 2’s profile has the vector of: ‘people, dogs, travel, animal’ with the credits of bigger than 0. Article 2 then shows the relation to people and travel and shows a higher multiplication score than the other articles.

figure4-User-Multiplication-section
Figure 4 – User Multiplication Scores

Since all user profile vectors have both plus and minus credits in its components, the multiplication of two vectors result in positive and negative scores within a wide range. Reviewing the principle of the cosine similarity above, positive scores of the multiplication are nearly identical. Near 0 score shows irrelative relations and minus scores suggest negative directions. The bigger the multiplication score is, the more similarity it demonstrates.

In Figure 4, Article 2 got the highest score when it is multiplied with User 2. On the contrary, since Article 3 has a minus score, User 2 is presumed to dislike Article 3.

As a result, for User 2, the most scored Article is 2 and it ranks on top of his or her news recommendation system.

Conclusion

Neil’s personalization algorithm returns individual outputs by using the principle of Cosine Similarity with the distinctive features of: simplicity, clarity, and accuracy.

  1. Simplicity: The multiplication of two vectors with the Cosine Similarity for the recommendation system
  2. Clarity: Pre-screened RSS sources for serving reliable articles by taking out ‘fake news’
  3. Accuracy:  Double processes of classifications by RSS genre and by individual’s preferences.

With these features, Neil’s algorithms are accurate and compatible with increasing data and can be applicable to any architecture. Neil is cost efficient and is easier to manage within its technological processes, while providing results.

As simplicity is becoming more complex, Neil’s clear and accurate system is flexible in adapting to rapidly changing markets.  Neil’s unique features and services offer core elements to advancing technology into implementing AI (Artificial Intelligence) and AEI (Artificial Emotional Intelligence).

About Author:

 

I am Mina Jin, working under the Development Dept. BPU Holdings, Head Office in Seoul, South Korea. I am interested in Deep Neural Network and excited to examine experiments. Principles of technology inspire me to explore future possibilities.

Zimgo