IPro Online Your Text Processing PHP Libraries 


   


 IPro Libraries > IPro Text

get_cap_phrases
get_complete_phrases
get_emails
get_token
get_char_suffixes
get_char_suffix_lcp
get_word_suffixes
get_word_suffix_lcp
is_cap_phrase
is_email
is_organization
is_date
is_stop_word
is_title
rmv_stopword
term_poss
term_iposs
tok_by_stopword
tok_by_punc
tok_by_symbol
tok_by_word
txt_to_ngram_token
txt_to_ngram_word
txt_to_sentence
txt_to_sentc_icl_pos
txt_to_token
txt_to_token_icl_pos
txt_to_word
txt_to_word_icl_pos
get_cap_phrases_icl_pos

IPro Array
IPro String
IPro Tag
IPro Text

Overview
FAQ


Introduction

IPro Text Library allows you to ... texts.

Requirements

PHP 5

Installation

See Installation.

Load Library

To use a library, include the library's .inc.php in your PHP script. Assume your document root is set as $path="/home/m3/public_html/", the include scripts for IPro Text is as followed.

IPro Text-include($path."ipro/text.inc.php");

Table of Contents

get_cap_phrases - Extract all capital phrases from text.
get_complete_phrases - Extract all complete phrases from text.
get_emails - Extract all emails from text.
get_token - Extract token of a given position from text.
get_char_suffixes - Construct suffixes of a text based on position of every character in text.
get_char_suffix_lcp - Calculate the longest common prefix for char suffixes.
get_word_suffixes - Construct suffixes of a text based on position of every word in text.
get_word_suffix_lcp - Calculate the longest common prefix for word suffixes.
is_cap_phrase - Check if first character of every word in text is upper case alphabet character.
is_email - Check if text is email.
is_organization - Check if text is an organization entity.
is_date - Check if text is a date entity.
is_stop_word - Check if text is stop word.
is_title - Check whether a text is a title.
rmv_stopword - Remove stop words from text.
term_poss - Find positions of all occurrences of a string.
term_iposs - Case-insensitive version of term_poss().
tok_by_stopword - Tokenize text by stop words.
tok_by_punc - Tokenize text by punctuation.
tok_by_symbol - Tokenize text by specified delimit characters.
tok_by_word - Tokenize text by specified delimit words.
txt_to_ngram_token - Generate n-gram tokens from text.
txt_to_ngram_word - Generate n-gram words from text.
txt_to_sentence - Tokenize text into sentences.
txt_to_sentc_icl_pos - Tokenize text into sentences (include start and end position of the sentences).
txt_to_token - Tokenize text into tokens.
txt_to_token_icl_pos - Tokenize text into token (include start and end position of the tokens).
txt_to_word - Tokenize text into words.
txt_to_word_icl_pos - Tokenize text into words. (include start and end position of the words).
get_cap_phrases_icl_pos - Extract all capital phrases from text. (include start and end position of the cap phrases).
Copyright 2012 by Mice3 Software. All Rights Reserved.Site Map  |  Terms of Service | About Mice3