ai-labs.org blog

Automatic Realtime Mind Map Builder From Mic.

MM-demo

MM-demo is a project aimed at building a mind map of conversation in real time. The operation of MM-demo can be divided into several stages:

Read more

Detect source-code regions with Neural Networks.

Motivation

It is well known that machine learning and other text processing related routines quite often imply a pre-processing of some raw textual data (like something received from the Internet). The problem described here is about separating the regular plain text regions from the regions containing script code.

Read more

Context-free grammars for English noun phases in Brown terms.

We designed an extremely simple np extractor. See it on github.

Simple noun phrases extractor which uses context-free 
grammar appears to be more robust for cases with noise 
and grammatically broken structure 

There are many modern NLP tools, like spacy that are able to find noun-phrases. All of them work well on a consistent text that can be parsed into a correct syntax tree. In cases when syntax is broken, POS and tree parsers become weak and things go worse. This solution has a goal to use extremely simple context free grammars on top of very simple and in general less accurate POS tagger consuming it’s only strong side - robustness for noised broken text. It is expected to be much weaker on a normal text, but perform better on input like twitter and chats.

Read more

spaCy model for Turkish quality evaluation.

To estimate the performance of the model, a library comparator.py was created. For conll format parsing we use some libraries of CoNLL-U Parser without its installation. CoNLL-U Parser parses a conll formatted string into a ordered python dictionary.

Read more

Result of Turkish Model Training.

  1. Run the Jupyter Notebook and create new notebook in browser window. Import spacy, displacy and load tr_unnamed model.
    import spacy
    from spacy import displacy
    nlp = spacy.load('tr_unnamed')
    
Read more

Adding a SpaCy support for Turkish. Part 2

Before making a decision if we can add a support for Turkish language we need to check availability of basic training data. We went through the publicly available Turkish corpora and analysed this.

Read more

ai-labs relaunch

Going open

ai-labs.org has been working on a non-public project for 2 years now. We are glad to announce that ai-labs is finalizing the current mode of working and becomes more open company.

Read more

Visualize parsing result

This is just a primitive “setup and use” snippet.

It is very easy to use the Jupyter Notebook with nbextensions for visualazing the tests of spaCy model. The displaCy Jupyter extension is a simple extension for Jupyter Notebook that lets you visualize a JSON-formatted dependency parse using the displaCy visualizer.

excerpt_separator:

Read more

English Sentence Rewriting Model Design And Training.

Technical implementation of a sentence re-writer based on seq2seq encoder can be taken from TensorFlow. - software designed by google which simplifies neural networks building and training.

Read more