MM-demo is a project aimed at building a mind map of conversation in real time. The operation of MM-demo can be divided into several stages:
Read moreIt is well known that machine learning and other text processing related routines quite often imply a pre-processing of some raw textual data (like something received from the Internet). The problem described here is about separating the regular plain text regions from the regions containing script code.
Read moreWe designed an extremely simple np extractor. See it on github.
Simple noun phrases extractor which uses context-free
grammar appears to be more robust for cases with noise
and grammatically broken structure
There are many modern NLP tools, like spacy that are able to find noun-phrases. All of them work well on a consistent text that can be parsed into a correct syntax tree. In cases when syntax is broken, POS and tree parsers become weak and things go worse. This solution has a goal to use extremely simple context free grammars on top of very simple and in general less accurate POS tagger consuming it’s only strong side - robustness for noised broken text. It is expected to be much weaker on a normal text, but perform better on input like twitter and chats.
Read moreTo estimate the performance of the model, a library comparator.py was created. For conll format parsing we use some libraries of CoNLL-U Parser without its installation. CoNLL-U Parser parses a conll formatted string into a ordered python dictionary.
Read moreimport spacy
from spacy import displacy
nlp = spacy.load('tr_unnamed')
Before making a decision if we can add a support for Turkish language we need to check availability of basic training data. We went through the publicly available Turkish corpora and analysed this.
Read moreai-labs.org has been working on a non-public project for 2 years now. We are glad to announce that ai-labs is finalizing the current mode of working and becomes more open company.
Read moreThis is just a primitive “setup and use” snippet.
It is very easy to use the Jupyter Notebook with nbextensions for visualazing the tests of spaCy model. The displaCy Jupyter extension is a simple extension for Jupyter Notebook that lets you visualize a JSON-formatted dependency parse using the displaCy visualizer.
excerpt_separator:
Read moreTechnical implementation of a sentence re-writer based on seq2seq encoder can be taken from TensorFlow. - software designed by google which simplifies neural networks building and training.
Read more