Header image  
Tutorials: Nov 13, 2011, Shanghai, China
Main Conference: Nov 14-17, 2011, Shanghai, China
Workshops: Nov 18, 2011, Hangzhou, China
line decor
   Final program of ICONIP2011 and book of abstracts are available now.  
line decor

Semantic Analysis and Search of Twitter and Chinese Weibo


Ming Zhou
Microsoft Research Asia


The objective of this tutorial is to provide the audience an overview of recent developments in the semantic analysis and search of micro-blogs such as Twitter and Chinese micro-blogs(weibo).

In recent years, we have witnessed tremendous interest of Twitter and weibo, both on data mining, business intelligence and search. However, the explosion of data and the unique language usage on Twitter and weibo presents big challenges to these efforts. Using the current search technologies which based on page rank and keyword matching can only get chronological ordered tweets, which take user long time to read tweets and understand the information contained. The semantic analysis of Twitter is a technology to distill the key information from large size of real-time tweets to support data mining and search of tweets.

This tutorial plans to provide a pipelined semantic analysis technologies starting from tweet text normalization, named entity identification, semantic role labeling and sentiment analysis, to tweet classification and clustering, and to the detection of influential accounts, hot topics, popular news and interesting communities. We will elaborate on sentiment analysis. In the last part of this tutorial, we will present a method of ranking tweets with multiple facet features from both the tweet content and the information from social network. A prototype of Twitter Search Engine will be demonstrated too.

Content and Benefits:

A tweet contains rich information about news, pictures, music and videos, and embeds the rich signals of breaking news, hot topics and trends. The implicit consumer feedback and wisdom of crowds, if can be mined, are valuable useful information to both end users and enterprises users for various kind of businesses needs. End users can get the customized information in real-time based on his or her personalized needs. The extracted information in various types is very useful to business users. After a company releases a new product, it would like to know the users�feedback to its product. The opinions mined from a large number of real-time tweets help the company to quickly understand the strengths and shortcomings of this product. Furthermore, the social impact score of a tweet account can be used to find influential people, which is important for a user to find appropriate person to follow. It is also highly valuable for an online advertisement company to put ads on the community that contains big number of potential users.

However, the useful information is overwhelmed by the flood of noises. The biggest proportion of tweets are pointless babble, spam, contents with offensive contents, all are of little value. Besides, a tweet with fragmented text with up to 140 characters contains various kinds of abbreviation and abnormal language use makes the text analysis even more difficulty for standard text mining technologies, such as named entity identification, part-of-speech tagging, sentiment analysis, semantic role labeling, event extraction, text classification, etc.

In this tutorial, the audience will learn a set of technologies we have developed in last two years at Microsoft Research Asia that handle the special issues of tweets to support useful scenarios including notably Twitter search. For instance, we will present a method of applying a semi-supervised method to enhance the named entity recognition of tweets. We will elaborate on sentiment analysis including the simple lexicon based approach and machine learning based approach and a new method of using graph-model to improve the accuracy of sentiment analysis. In the last part of this tutorial, we will present a preliminary exploration for Twitter search by ranking searched tweets with multiple features from tweet content and social network. Finally, a prototype of Twitter Search called Quickview will be presented which demonstrates the technologies of semantic analysis and search.

Target Audience:

This tutorial is for practitioners and researchers in the natural language processing, social network and search engine who would like to know the techniques and methods about semantic analysis and search for Twitter and Chinese weibo.


Dr. Ming Zhou’s research interests include natural language processing, machine translation, search engine, and semantic analysis and search of news and social network. In these research areas, he has about 100 technical publications in journals and conferences. In addition, he has over 20 US patents. He is the associate editor of ACM Transaction of Asian Language Information Processing, International journal of Machine Translation, International journal of Computational Linguistics. Notably, he is the key inventor and research leader of many famous software and service including: The first Chinese-English machine translation system (CEMT-I, 1989); Bing Dictionary (for English learning, translating and writing)(website), the winner of Asian Innovation Award--Reader’s Choice Award selected by Wall Street Journal in 2010); A popular AI game for Chinese couplets generation(website); MS IME for Chinese and Japanese. Both the Chinese couplets and IME are the winners of the top innovations from MSRA in its first 10 years (1998-2008); J-Beijing Chinese-Japanese translation system (website), the J-Server language translation service powered by J-Beijing is the winner of Nagao Award granted by Asian-Pacific Association of Machine Translation (AAMT) in 2008.

Ming Zhou has been the manager of Microsoft Research Asia Natural Language Computing Group since 2001. He is an expert in the areas of machine translation and natural language processing. He receives his Bachelor degree from Chongqing University and his PhD degree from Harbin Institute of Technology, both on computer engineering and science. He was a post-doc in 1991 in Computer Science Dept at Tsinghua University and in 1993, he became an associate professor at the same university. He joined Microsoft Research Asia in 1999 as researcher.

Copyright© 2010-2011 International Conference on Neural Information Processing. All rights reserved.