Natural Language Processing and Big Data: Using NLTK and Hadoop

My previous startup, Unbound Concepts, created a machine learning algorithm that determined the textual complexity (e.g. reading level) of children’s literature. Our approach started as a natural language processing problem — designed to pull out language features to train our algorithms, and then quickly became a big data problem when we realized how much literature we had to go through in order to come up with meaningful representations. We chose to combine NLTK and Hadoop to create our Big Data NLP architecture, and we learned some useful lessons along the way. This series of posts is based on a talk done at the April Data Science DC meetup.

Think of this post as the Cliff Notes of the talk and the upcoming series of posts so you don’t have to read every word … but trust me, it’s worth it.

Related to the interaction between Big Data and NLP:

Natural Language Processing needs Big Data
Big Data doesn’t need NLP… yet.

Related to using Hadoop and NLTK:

The combination of NLTK and Hadoop is perfect for prepossessing raw text
More semantic analysis tend to be graph problems that Map Reduce isn’t great at computing.

About data products in general:

The foo of Big Data is the ability to take domain knowledge and a data set (or sets) and iterate quickly through hypotheses using available tools (NLP)
The magic of big data is that there is currently a surplus of both data and knowledge and our tools are working, so it’s easy to come up with a data product (until demand meets supply).

I’ll go over each of these points in detail as I did in my presentation, so stay tuned for the longer version [editor: so long that it has been broken up into multiple posts]

The post Natural Language Processing and Big Data: Using NLTK and Hadoop – Talk Overview appeared first on Data Community DC.

Natural Language Processing and Big Data: Using NLTK and Hadoop – Talk Overview

Trending Articles

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Practice Sheet of Right form of verbs for HSC Students

[R.G. Mechanics] Assassin's Creed IV - Black Flag

Neem Baba Extra Questions Answer Class 6 English Poorvi

Tate McRae – TIT FOR TAT – Single [iTunes Plus M4A]

Teen Shot In Miami Drive-By Dies From Injuries

VARRIO LA RANA

REQ: The Producer School Tantra Afro & Melodic House Sample Pack

SharePoint のページをフレーム (FRAME,IFRAME) 内に表示する方法について

The 10 Tennessee Cities With The Largest Black Population For 2021

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Kanulanu Thaake Lyrics and translation | Manam (2014)

Mahakal Attitude Status

The 6 Best Sex Scenes in Nollywood Movies

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Download: Chester ft Dalisoul – Mulomo -(Prod By Chester)

Windows Server バックアップがサーバーマネージャーの GUI 上から表示されなくなる事象について

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Man left critically injured after mystery attack in Grimsby is making...