Saturday, December 21, 2013

I built and packaged Hyepric HQ 5.8, please download...

In December, VMWare released the VCenter Hyperic 5.8. So maybe Hyperic Community Edition matured.
At sf.net and official website, HQ v5.8 is silent.
I built , package and upload to the internet.

Please enjoy it:
http://pan.baidu.com/s/1c0CFl5I

for all including aix and hpux agent.


the discussion site is  http://hq.innovatedigital.com/



Monday, December 9, 2013

Nutch 1.7 or Nutch 2.1, setting nutch 1.7 environment

Is nutch1.7 enough for about  100,000 pages? I try it.
The valuable stuff is not always too immense.  When I crawled less than 300 websites, It occupy less than 1G. If you have limited resource with CPU and memory, nutch 1.7 is good.

So I think nutch 1.7 fits this.

I set up nutch1.7 with solr4.6. And there was some problems.
  • nutch reported:

2013-12-07 11:26:08,540 INFO  parse.ParseSegment - Parsed (0ms):http://www.bjmm.org.cn/outpart/managerarticle.do?method=page&articleId=1651
2013-12-07 11:26:08,545 WARN  mapred.LocalJobRunner - job_local1202925054_0001
java.lang.Exception: java.lang.OutOfMemoryError: unable to create new native thread at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)

Solution Wrong: set fetcher.parse to true and fetcher.store.content to false It did not help.
Solution Right: modify some codes as https://issues.apache.org/jira/browse/NUTCH-1640?jql=project%20%3D%20NUTCH%20AND%20text%20~%20%22create%20thread%22

Some thing like:
in file  src/java/org/apache/nutch/parse/ParseSegment.java
private ParseUtil parseUtil = null;

replace parseResult = new ParseUtil(getConf()).parse(content); with

      if (parseUtil == null) 
          parseUtil = new ParseUtil(getConf());
      parseResult = parseUtil.parse(content);

Ant and run again. It works.
  • What php client is proper? Solarium , Solr-client-php,...
Solarium is good for updating recently.