Tuesday, November 26, 2013

Setting up nutch 2.2.1 with some problems

These days I am setting up a search engine for some official websites. It is only because when I search for technical anwsers, the confusions of the results  lead to many wrong ways. Finally, I find the official documents is more valuable than discussions, blogs ...

So I followed the official document at
http://wiki.apache.org/nutch/Nutch2Tutorial

If you want to encounter less problems, please follow that strictly. Hbase-0.90 has to be used.

If you have no idea on Nutch, It is better to follow http://wiki.apache.org/nutch/NutchTutorial first. Some important configuration should be set .

Below are some problems I had.

  • when run 

bin/crawl /home/nutch/apache-nutch-2.2.1/urls/seeds.txt mmtest http://localhost:8983/solr/ 10
It reports:

SolrIndexerJob: starting
Adding 11 documents
Adding 11 documents
SolrIndexerJob: java.lang.RuntimeException: job failed: name=[mmtest]solr-index, jobid=job_local1440285148_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:46)



solution:
  copy the schema.xml to solr dir
  • when run again
bin/crawl /home/nutch/apache-nutch-2.2.1/urls/seeds.txt mmtest http://localhost:8983/solr/ 10
It reprots:
13033 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer  ? Unable to create core: collection1
org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "text": Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.EnglishPorterFilterFactory'. Schema file is /home/nutch/solr-4.5.1/example/solr/collection1/schema.xml
Solution:
Use SnowballPorterFilterFactory with language="English" instead of EnglishPorterFilterFactory
  • run the crawl again
It reports:
5377 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer  ? Unable to create core: collection1
org.apache.solr.common.SolrException: Unable to use updateLog: _version_ field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
Solution:
insert below line into the schema.xml
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
  • run the crawl again
reports:
4764 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore  ? org.apache.solr.common.SolrException: undefined field text
at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1235)
Solution:
in solrconfig.xml, change        <str name="df">text</str> to
       <str name="df">content</str>

Now It worked.

Saturday, November 23, 2013

About Hyperic HQ 's future

I have followed the hyperic for some years. After Hyepric HQ was acquired by SpringSource and then by VMWare, it is on its specific road , it turn to be "VMware vCenter Hyperic". The future of opensource version is more vague...

Tuesday, November 12, 2013

Developing Hyperic HQ Plugin to moitor Apache Solr


Apache Solr is so popular, some ones look for the monitoring tools.
Because it is opensource, so the monitoring solution should be opensource tool. Then we developed the plugin based on hyperic hq occupied by VMWare.

Current plugin supports Solr v3.6, if other versions arevrequired, please contact us,
Some metrics are below:
JVM Metrics
  activeThreadCount
  CurrentHeapSize
  TotalHeapSize
Searcher Metrics
  Searcher Number of Docs
       Searcher Max Docs
Query Metrics
       Query Result Cache Evictions
       Query Result Cache Hit Ratio
       Query Result Cache Hits
       Query Result Cache Inserts
       Query Result Cache Lookups
       Query Result Cache Sizes
Document Metrics
       Document Cache Evictions
      Document Cache Hit Ratio
      Document  Cache Hits
      Document  Cache Inserts
      Document  Cache Lookups
      Document  Cache Sizes
Filter Metrics
      Filter Cache Evictions
      Filter Cache Hit Ratio
      Filter  Cache Hits
      Filter  Cache Inserts
      Filter Cache Lookups
      Filter  Cache Sizes
Update Metrics
      Update Handler Adds
      Update Handler Commits
      Update Handler Autocommits
      Update Handler Optimizes
      Update Handler Rollbacks
      Update Handler ExpungeDeletes
      Update Handler DocsPending
      Update Handler DeletesById
      Update Handler DeletesByQuery
      Update Handler Errors
Is it enough? if not, please contact me. 

I provide services on installing and setting.If interested, contact martinking1997@gmail.com

Enjoy.



Developing and improving Foglight Cartridge for Oracle Tuxedo


There is no out of box cartridge for tuxedo in Dell Foglight. So we developped one to monitor it. This cartridge contains good views, rules,reports to help monitor it.

Key Metrics Monitored

1。Servers
  Program Name,Queue Name,Group Name, Server ID, Requests Done, Load Done,Current Service
  2。Services
  Service Name, Routine Name, Proggram Name,  Grp Name,Server  ID, Machine, Requests Done, Status
  3.Queues
  Proggram Name,   Queue Name, Servers on Queue,  Wk Queued,  Queued Service, Average Length, Machine
  4.Clients
  LMID, User Name,  Client Name,  Time,    Status,  Transactions Begun,Transactions Committed,Transactions Aborted
  5.Tuxedo Common Information
  Availability, Version, Total Requests, Total Queues Length

Interfaces of  Views 

1。Tuxedo Explorer
  The view dispplays availability status, health, version, alerts and amount of resources for each Tuxedo instance.
For more details,please visit http://www.innovatedigital.com/node/943

Developing Foglight Cartridge for Oracle® AS 10g OC4j 9.03 oc4j 9.0.4


At 2004, Oracle released application server - Oracle® Application Server Containers for J2EE 10g, with the kernel OC4J,That versions were OC4J 9.0.3, OC4J 9.0.4 with JDK 1.3 or JDK 1.4。
Some customers deployed the solutions for their critical business application. Now the application should be monitored.
Dell Foglight does not provide solutions out of box. Now InnovateDigital developed the cartridge based on the Foglight.

The metrics collected:

JVM: Active Thread Groups, Active Threads,Free Memory,Total Memory,UpTime;
Servlet :is Active, Average Execution Time, Completed Count, Max Execution Time, Min Execution Time, Total Time, Current Average Execution Time;
JDBC Connection: Create Time, Close Count, Create Count, Open Count;

Because in that decade, JDK and OC4J did not provide the standard interface  just like jmx. So parse the specific data stream was the only way to fetch runtime information from OC4J.

more details http://www.innovatedigital.com/node/957