首页(175) 数据挖掘研究(27) 数据挖掘实践(53) 数据挖掘介绍(25) 杂谈(59) 管理页面   写新日志   退出   关于IDMer

 Blog信息
 
blog名称:IDMer (数据挖掘者)
日志总数:175
评论数量:848
留言数量:119
访问次数:2497048
建立时间:2005年6月24日

 日志更新
 

 我的相册
 

It's me!


 最新评论
 

 留言板
 

 链接
 

 联系方式

 日志搜索





 公告
“数据挖掘者”博客已经搬家,欢迎光临新博客网址:http://idmer.blog.sohu.com
我的新浪微博:
@张磊IDMer
 网络日志
文本挖掘:事实与谎言
数据挖掘者 发表于 2008/1/1 23:00:43
转自:http://www.texttechnologies.com/2007/12/23/text-mining-myths-realities/ Text mining - fact and fiction Categories: Text mining — Curt Monash @ 6:37 pm 文本挖掘只是用在人工智能方面的科学研究上。 ×Text mining is science-project artificial intelligence. Fiction. Text mining is proven in many practical applications. 要开发文本挖掘软件,你需要计算语言学方面的知识。√To implement text mining, you need computational linguists. Fact. Monash’s Second Law of Commercial Semantics states “Where there are ontologies, there is consulting.” And it’s linguists, or reasonable facsimiles of same, who do the consulting. 要使用文本挖掘工具,你需要计算语言学方面的知识。×To use text mining, you need computational linguists. Fiction. When last I counted, the number of known computational linguists working for end-user organizations, worldwide, was precisely 1, at Procter & Gamble. (Intelligence agencies excepted, of course.) I’d guess it’s higher now, but I probably could still count them all without taking my socks off. CRM应用是文本挖掘发展的动力。√CRM applications are driving the growth of text mining. Fact. Most current growth in text mining seems to come from Voice of the Customer and Voice of the Market/competitive intelligence applications. And a couple of years ago, when SAS and SPSS had a joint boom in text mining, a lot of that was coming from CRM. 文本挖掘软件主要适用于大型企业。基本上√Text mining products are useful mainly for large enterprises. More fact than fiction. Text mining makes the most sense when you have too much text for humans to read and summarize. 关系数据库上不适合使用文本挖掘技术。×Text mining doesn’t fit well with relational databases. Fiction. The fastest-growing text mining companies seem to be Attensity and Clarabridge, who consistently extract textual information into relational databases. 文本挖掘把无结构数据进行结构化。基本上√Text mining imposes structure on unstructured* data. More fact than fiction. Most text mining applications involve examining free-text documents and creating entries in relational or XML databases. Most people would call that a transition from unstructured to structured form. *I still don’t like the “structured/unstructured” distinction, but with repetition I’m getting somewhat inured to it. 企业级搜索是文本挖掘的一种替代。√Enterprise search is an alternative to text mining. Fact. You can use a high-end search engine to cluster documents and look for trends and insight. It’s not the real McCoy, but in some cases it gives you 80% of the benefit of the real thing. 文本挖掘只是一个组件,而非独立的产品。半√半×Text mining is an ingredient, not a product category. Part fact, part fiction. The biggest text mining efforts in the world are probably at Google, Yahoo, Microsoft search, and Dow Jones/Factiva. Antispam vendors also invest a lot in text mining. Two of the top five independent text mining vendors were acquired this year (ClearForest and Inxight). And of the many dozens of small text mining independents, most are focused on specific niches. Even so, Attensity, Clarabridge, and Temis show that, at least for now, text mining remains a legitimate product category. 文本挖掘的软件厂商正陷入困境。半√半×The text mining industry is in trouble. Part fact, part fiction. As I recently ranted, even the leading text mining vendors are letting many opportunities pass them by. And like many software sectors, text mining seems poised to be absorbed via large-company acquisition. SAP has already secured a text mining business via BOBJ/Inxight, but at least one vendor each could easily be bought by Oracle, Microsoft (despite the in-house expertise from its search arm), and IBM (despite or even in connection with UIMA). But in the meantime, a few small text mining vendors are still showing rapid growth. Previous “fact and fiction” post: Data warehouse appliances. Stay informed! No hassle, no spam — all it takes is an email address or an RSS subscription! Get all our research — on text analytics, DBMS, BI, and everything else — or just the text analytics part, or even just a very few notifications of our most important news. Technorati Tags: Text mining, text analytics

阅读全文(8298) | 回复(3) | 编辑 | 精华
回复:文本挖掘:事实与谎言
kuhasu(游客)发表评论于2008/8/24 16:05:42
以下引用kuhasu(游客)在2008-1-26 16:34:31的评论: 文本挖掘软件主要适用于大型企业。× A tool,just a tool.Maybe a product is expensive,but you can develop your own textmining tools. For text mining and voice mining,almost every one needs them in fact,esp.text ming. 以下为blog主人的回复: Yes, the need of text mining is everywhere. I think what the author means is most of the buyers are large enterprise, because small or medium company maybe don't want to spend money on it.*^_^*

个人主页 | 引用回复 | 主人回复 | 返回 | 编辑 | 删除
请教个问题
insKy(游客)发表评论于2008/2/19 20:05:03
请问类似百度底部的相似搜索是如何做的?用Data Mining实现的吗? 谢谢!能否稍微详细的说下 tks a lot 以下为blog主人的回复: 可能用到了关键词匹配、分类体系、文本聚类等技术。
个人主页 | 引用回复 | 主人回复 | 返回 | 编辑 | 删除
回复:文本挖掘:事实与谎言
kuhasu(游客)发表评论于2008/1/26 16:34:31
文本挖掘软件主要适用于大型企业。× A tool,just a tool.Maybe a product is expensive,but you can develop your own textmining tools. For text mining and voice mining,almost every one needs them in fact,esp.text ming. 以下为blog主人的回复: Yes, the need of text mining is everywhere. I think what the author means is most of the buyers are large enterprise, because small or medium company maybe don't want to spend money on it.
个人主页 | 引用回复 | 主人回复 | 返回 | 编辑 | 删除
» 1 »

发表评论:
昵称:
密码:
主页:
标题:
验证码:  (不区分大小写,请仔细填写,输错需重写评论内容!)


站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.047 second(s), page refreshed 144763669 times.
《全国人大常委会关于维护互联网安全的决定》  《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号