Polish company operating in deep neural networks helps to fight with disinformation in the Internet

The project of the Polish company based on deep neural networks allows to automatically search for trolls in the network and fight disinformation. It will also evaluate and predict, as well as suggest an effective form of business publication on social networking sites.

Fighting online disinformation is a huge challenge, both for social networking sites and for business. On the other hand, it is a huge threat to users, but also to entire communities, organizations, countries and companies. Various types of disinformation campaigns can hit not only politicians directly, but also national minorities.

Twitter made an interesting discovery a few months ago by researching how its algorithm recommends political content to users. It turned out that the algorithms it used strengthened the tweets of right-wing political parties and news services more than those of left-wing or liberal ones.

It is therefore not surprising that thousands of disinformation posts about the war in Ukraine appear recently on Twitter.

Currently artificial intelligence is aiming to fight with such disinformation. The algorithms created by Polish company using deep machine learning are already working on Twitter and they recognize trolls with very high efficiency. MIM Solutions is responsible for the project. Every day, a special model of artificial intelligence reads data available on Twitter in search of posts and profiles “dealing” with disinformation. The AI picks them up and marks them. What is most innovative is the fact that a Polish company only needs 50 examples of such profiles and tweets to “catch” the troll, instead of a number of analysed data sets.

Algorithms read what appears daily on the social network, then analyse and flag suspicious profiles based on tweets. The project is currently at the proof of concept stage. What is worth noting is that the process is not manual but fully automated. The experts from MIM Solutions have created a model based on deep neural networks that allows to read Twitter automatically.

A deep neural network is a neural network with a certain level of complexity, a neural network with more than two layers. Deep neural networks use sophisticated mathematical modelling to process data in complex ways. A neural network can be therefore referred to as a technology built to simulate the activity of the human brain – specifically, pattern recognition and the passage of input through various layers of simulated neural connections. Experts define deep neural networks as networks that have an input layer, an output layer and at least one hidden layer in between. Each layer performs specific types of sorting and ordering in a process that some refer to as “feature hierarchy.” One of the key uses of these sophisticated neural networks is dealing with unlabeled or unstructured data. The phrase “deep learning” is also used to describe these deep neural networks, as deep learning represents a specific form of machine learning where technologies using aspects of artificial intelligence seek to classify and order information in ways that go beyond simple input/output protocols.

The Polish company describes the level of effectiveness of their model as very high amounting to around 95 percent or more. The model will be soon transformed into a Twitter bot that will search for trolling, not only related to the war in Ukraine. The system will soon gain the ability to evaluate, analyse and predict specific publications on the web, with a simultaneous impact on the value for companies.

From the technical side, the deep neural networks focusing on detecting fake news or information start with finding a dataset, then there takes place exploratory data analysis, cleaning the dataset, analysis of the trends found within it, preprocessing the collected data (for example by lowercasing all characters, removing punctuation); then there typically takes place the application of the NLTK library to conduct further preprocessing on the dataset which involves: tokenization, lemmatization and removal of stop words; then there are used models to categorise the text – LSTM model or a BERT model (Bidirectional Encoder Representations from Transformers).