yacy:rc1.git
5 years agoMerge branch 'master' of git://git.gitorious.org/yacy/rc1.git into limited limited
Lotus [Thu, 2 Aug 2012 08:54:41 +0000 (10:54 +0200)]
Merge branch 'master' of git://git.gitorious.org/yacy/rc1.git into limited

5 years agoadjusting link/word ratio by measurements
Lotus [Thu, 2 Aug 2012 08:54:30 +0000 (10:54 +0200)]
adjusting link/word ratio by measurements

5 years agoadded the JSON response writer to solr interface, add &wt=json to the
Michael Peter Christen [Tue, 31 Jul 2012 22:14:56 +0000 (00:14 +0200)]
added the JSON response writer to solr interface, add &wt=json to the
servlet GET properties to use this format

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Tue, 31 Jul 2012 21:49:56 +0000 (23:49 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agobad hack to prevent a bug appearing in solr
Michael Peter Christen [Tue, 31 Jul 2012 21:49:07 +0000 (23:49 +0200)]
bad hack to prevent a bug appearing in solr

5 years agoprevent merge of blobs that can't be handled in memory
sixcooler [Tue, 31 Jul 2012 21:23:16 +0000 (23:23 +0200)]
prevent merge of blobs that can't be handled in memory

5 years agofix for a NPE
Michael Peter Christen [Mon, 30 Jul 2012 12:51:01 +0000 (14:51 +0200)]
fix for a NPE

5 years agonowrap from gaston in forum
Michael Peter Christen [Mon, 30 Jul 2012 10:39:47 +0000 (12:39 +0200)]
nowrap from gaston in forum
http://forum.yacy-websuche.de/viewtopic.php?p=26815#p26815

5 years agosnippet retrieval loading processes may use a smaller minimum load time
Michael Peter Christen [Mon, 30 Jul 2012 08:38:23 +0000 (10:38 +0200)]
snippet retrieval loading processes may use a smaller minimum load time
value than crawling processes. This speeds up the search result
preparation dramatically.

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Fri, 27 Jul 2012 10:14:24 +0000 (12:14 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agoAbstraction of HandleMap and HandleSet
Michael Peter Christen [Fri, 27 Jul 2012 10:13:53 +0000 (12:13 +0200)]
Abstraction of HandleMap and HandleSet

5 years agocheck content domain fix:
sixcooler [Fri, 27 Jul 2012 02:11:52 +0000 (04:11 +0200)]
check content domain fix:
search image/media should not show pages containing image/media
search text should show all/text but image/media

5 years agoclose augmented stream if filled from cache to get its content
sixcooler [Thu, 26 Jul 2012 16:09:40 +0000 (18:09 +0200)]
close augmented stream if filled from cache to get its content
use augmented stream if proxyAugmentation is set only

5 years agobetter calculation of possible saving in HeapReader index data structure
Michael Peter Christen [Thu, 26 Jul 2012 08:05:06 +0000 (10:05 +0200)]
better calculation of possible saving in HeapReader index data structure

5 years agodocumentation/comments
Michael Peter Christen [Wed, 25 Jul 2012 19:34:23 +0000 (21:34 +0200)]
documentation/comments

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Wed, 25 Jul 2012 19:18:30 +0000 (21:18 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agono translation of queue-links
sixcooler [Wed, 25 Jul 2012 13:35:13 +0000 (15:35 +0200)]
no translation of queue-links

5 years agocleaned up classes and methods which are either superfluous at this time
Michael Peter Christen [Wed, 25 Jul 2012 12:31:54 +0000 (14:31 +0200)]
cleaned up classes and methods which are either superfluous at this time
or will be superfluous or subject of complete redesign after the
migration to solr. Removing these things now will make the transition to
solr more simple.

5 years agoMoved solr index-add method to the same method where the YaCy index is
Michael Peter Christen [Tue, 24 Jul 2012 23:53:47 +0000 (01:53 +0200)]
Moved solr index-add method to the same method where the YaCy index is
written. Also done some code-cleanup.

5 years agocleanup
Michael Peter Christen [Tue, 24 Jul 2012 20:16:56 +0000 (22:16 +0200)]
cleanup

5 years agobugfix for a NPE
Michael Peter Christen [Tue, 24 Jul 2012 15:29:32 +0000 (17:29 +0200)]
bugfix for a NPE

5 years agoextended abstraction of local and remote solr index using one front-end
Michael Peter Christen [Tue, 24 Jul 2012 15:23:29 +0000 (17:23 +0200)]
extended abstraction of local and remote solr index using one front-end
for index administration and querying.

5 years agofixed node type calculation for principal peers
Michael Peter Christen [Mon, 23 Jul 2012 21:40:50 +0000 (23:40 +0200)]
fixed node type calculation for principal peers

5 years agoadded user-authentication protection to solr search (same as implemented
Michael Peter Christen [Mon, 23 Jul 2012 19:43:14 +0000 (21:43 +0200)]
added user-authentication protection to solr search (same as implemented
for yacysearch)

5 years agobetter explain how to access the embedded solr
Michael Peter Christen [Mon, 23 Jul 2012 19:31:12 +0000 (21:31 +0200)]
better explain how to access the embedded solr

5 years agochanged options in IndexFederated_p to switch on/off parts of the index
Michael Peter Christen [Mon, 23 Jul 2012 14:28:39 +0000 (16:28 +0200)]
changed options in IndexFederated_p to switch on/off parts of the index
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.

5 years agofix for http://bugs.yacy.net/view.php?id=202
Michael Peter Christen [Sun, 22 Jul 2012 22:36:18 +0000 (00:36 +0200)]
fix for http://bugs.yacy.net/view.php?id=202

5 years agoMerge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
Michael Peter Christen [Sun, 22 Jul 2012 22:35:14 +0000 (00:35 +0200)]
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1

5 years agoremoved localized number formatting from num-results_totalcount response (this is...
reger [Sun, 22 Jul 2012 22:00:40 +0000 (00:00 +0200)]
removed localized number formatting from num-results_totalcount response (this is only used in xml and json where localized format is not valid)

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Sun, 22 Jul 2012 19:50:44 +0000 (21:50 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years ago- more abstraction for the RWI index as preparation for solr integration
orbiter [Sun, 22 Jul 2012 11:18:45 +0000 (13:18 +0200)]
- more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter [Sat, 21 Jul 2012 11:34:57 +0000 (13:34 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agopatches to ensure that solr connectors are active ony if they have a
orbiter [Fri, 20 Jul 2012 09:47:50 +0000 (11:47 +0200)]
patches to ensure that solr connectors are active ony if they have a
solr object assigned and vice versa

5 years agoembedded solr is only initiated if it is activated with
orbiter [Fri, 20 Jul 2012 09:40:33 +0000 (11:40 +0200)]
embedded solr is only initiated if it is activated with
IndexFederated_p.html

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Fri, 20 Jul 2012 07:04:14 +0000 (09:04 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agosource change in classpath
Michael Peter Christen [Fri, 20 Jul 2012 07:04:02 +0000 (09:04 +0200)]
source change in classpath

5 years agopartial html fix for
Lotus [Fri, 20 Jul 2012 06:53:12 +0000 (08:53 +0200)]
partial html fix for
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4454

5 years agoadded classpath for htroot/solr
orbiter [Thu, 19 Jul 2012 22:59:58 +0000 (00:59 +0200)]
added classpath for htroot/solr

5 years agoadded a solr search index
Michael Peter Christen [Thu, 19 Jul 2012 09:34:05 +0000 (11:34 +0200)]
added a solr search index
- by default, a (empty) solr storage instance is created at
SEGMENTS/solr_36
- the index is written if in /IndexFederated_p.html the flag "embedded
solr search index" is switched on
- a standard solr query interface is available now with a new servlet at
http://127.0.0.1:8090/solr/select

To test this, do the following:
- switch to webportal mode
- switch on the feature as described
- do a crawl. this fills the solr index. The normal YaCy search will NOT
work now!
- do a solr query, like:
http://127.0.0.1:8090/solr/select?q=*:*
http://127.0.0.1:8090/solr/select?q=text_t:Help
play with different search fields as you can see in
/IndexFederated_p.html
You can use the standard solr query attributes as described in
http://wiki.apache.org/solr/SearchHandler

5 years agoallow larger log entries
Michael Peter Christen [Sat, 14 Jul 2012 14:28:14 +0000 (16:28 +0200)]
allow larger log entries

5 years agoremoved a crawler overhead (terminated loop which searches greatest
Michael Peter Christen [Sat, 14 Jul 2012 11:11:04 +0000 (13:11 +0200)]
removed a crawler overhead (terminated loop which searches greatest
stack that has zero-waiting urls). This should cause a slightly faster
crawl for crawl stacks with many different domains in the crawl queue.

5 years agoenhancement in internal data organization which should generate less
Michael Peter Christen [Sat, 14 Jul 2012 11:09:44 +0000 (13:09 +0200)]
enhancement in internal data organization which should generate less
synchronizations in database access

5 years agocollection of speed and memory saving hacks
Michael Peter Christen [Fri, 13 Jul 2012 19:15:38 +0000 (21:15 +0200)]
collection of speed and memory saving hacks

5 years agoless usage of generic logger to avoid logger generation overhead
orbiter [Thu, 12 Jul 2012 17:54:54 +0000 (19:54 +0200)]
less usage of generic logger to avoid logger generation overhead

5 years agoprevent enqueueing of non-loggeable logging entries
orbiter [Thu, 12 Jul 2012 17:42:42 +0000 (19:42 +0200)]
prevent enqueueing of non-loggeable logging entries

5 years agoreduced logging overhead (a bit)
orbiter [Thu, 12 Jul 2012 17:23:40 +0000 (19:23 +0200)]
reduced logging overhead (a bit)

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter [Thu, 12 Jul 2012 09:14:04 +0000 (11:14 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agoreplaced more size() > 0 by !isEmpty()
orbiter [Thu, 12 Jul 2012 09:12:21 +0000 (11:12 +0200)]
replaced more size() > 0 by !isEmpty()

5 years agoreduction of logging to prevent too much IO caused be logging
Michael Peter Christen [Thu, 12 Jul 2012 00:08:11 +0000 (02:08 +0200)]
reduction of logging to prevent too much IO caused be logging

5 years agofixed a memory leak inside the logger which appeared if the log was
Michael Peter Christen [Wed, 11 Jul 2012 23:23:04 +0000 (01:23 +0200)]
fixed a memory leak inside the logger which appeared if the log was
writter faster that the logger is able to print this out to its out
stream. A very large collection of unwritten log outputs had been seen
during strong crawling. The new ArrayBlockingQueue is limited to prevent
this case.

5 years agoadded creation of subpath pattern when crawl start is 'from file'
Michael Peter Christen [Wed, 11 Jul 2012 21:18:57 +0000 (23:18 +0200)]
added creation of subpath pattern when crawl start is 'from file'

5 years ago- replaced all length() == 0 and size() == 0 with isEmpty()
orbiter [Tue, 10 Jul 2012 20:59:03 +0000 (22:59 +0200)]
- replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods

5 years agofix for url matcher of multiple amp& in an url, see:
orbiter [Tue, 10 Jul 2012 15:39:56 +0000 (17:39 +0200)]
fix for url matcher of multiple amp& in an url, see:
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4439&p=26650#p26650

5 years ago- removed cleaning of blacklist cache on startup
Roland 'Quix0r' Haeder [Tue, 10 Jul 2012 11:08:16 +0000 (13:08 +0200)]
- removed cleaning of blacklist cache on startup
- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations

5 years agousing SwitchboardConstants for solr attributes
orbiter [Tue, 10 Jul 2012 10:01:20 +0000 (12:01 +0200)]
using SwitchboardConstants for solr attributes

5 years agobump to httpclient-4.2.1
sixcooler [Mon, 9 Jul 2012 16:58:33 +0000 (18:58 +0200)]
bump to httpclient-4.2.1

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter [Mon, 9 Jul 2012 12:33:11 +0000 (14:33 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agofix for RSS reader
orbiter [Mon, 9 Jul 2012 12:32:35 +0000 (14:32 +0200)]
fix for RSS reader

5 years agorefactoring of query attribute variable names for better consistency
orbiter [Mon, 9 Jul 2012 09:14:50 +0000 (11:14 +0200)]
refactoring of query attribute variable names for better consistency
with (next) stored query words

5 years agoRelease 1.04 Release_1.04
Michael Peter Christen [Sun, 8 Jul 2012 22:13:59 +0000 (00:13 +0200)]
Release 1.04

5 years agouse less memory for md5 cache
Michael Peter Christen [Sun, 8 Jul 2012 20:05:04 +0000 (22:05 +0200)]
use less memory for md5 cache

5 years agomore logging
Michael Peter Christen [Sun, 8 Jul 2012 20:04:36 +0000 (22:04 +0200)]
more logging

5 years agofilter old peers from bootstrap (now stronger: 60 minutes instead of
Michael Peter Christen [Sun, 8 Jul 2012 19:25:22 +0000 (21:25 +0200)]
filter old peers from bootstrap (now stronger: 60 minutes instead of
240).

5 years agoadded classification for control file types which shall not be loaded
Michael Peter Christen [Sun, 8 Jul 2012 19:17:33 +0000 (21:17 +0200)]
added classification for control file types which shall not be loaded
but placed onto the noload-queue

5 years agoadded webm mime-type
Michael Peter Christen [Sun, 8 Jul 2012 15:59:20 +0000 (17:59 +0200)]
added webm mime-type

5 years agoadded webm
Michael Peter Christen [Sun, 8 Jul 2012 15:58:05 +0000 (17:58 +0200)]
added webm

5 years agofix for url camel case parser and sentence reader
Michael Peter Christen [Sun, 8 Jul 2012 14:48:09 +0000 (16:48 +0200)]
fix for url camel case parser and sentence reader

5 years agofix for sitemap importer: can now also import very large sitemaps within
Michael Peter Christen [Sun, 8 Jul 2012 14:11:50 +0000 (16:11 +0200)]
fix for sitemap importer: can now also import very large sitemaps within
small memory configurations

5 years agofix for sevenzip parser
Michael Peter Christen [Sun, 8 Jul 2012 14:11:19 +0000 (16:11 +0200)]
fix for sevenzip parser

5 years agocatch and log a warning in RasterPlotter
Michael Peter Christen [Fri, 6 Jul 2012 07:21:12 +0000 (09:21 +0200)]
catch and log a warning in RasterPlotter

5 years ago- fixed a memory leak (or bad usage) during parsing/snippet fetch
Michael Peter Christen [Fri, 6 Jul 2012 07:05:41 +0000 (09:05 +0200)]
- fixed a memory leak (or bad usage) during parsing/snippet fetch
- more logging for errors

5 years agoprevent loading of content from the cache when retrieval with IFFRESH is
Michael Peter Christen [Fri, 6 Jul 2012 06:29:41 +0000 (08:29 +0200)]
prevent loading of content from the cache when retrieval with IFFRESH is
used and cache is stale. Should speed up snippet generation when cache
strategy is IFFRESH.

5 years agofix to solr configuration (case where the external solr was not online)
Michael Peter Christen [Thu, 5 Jul 2012 23:29:13 +0000 (01:29 +0200)]
fix to solr configuration (case where the external solr was not online)

5 years agomore abstraction of error message
sixcooler [Thu, 5 Jul 2012 12:50:37 +0000 (14:50 +0200)]
more abstraction of error message

5 years agoabstraction of error message
Michael Peter Christen [Thu, 5 Jul 2012 12:27:28 +0000 (14:27 +0200)]
abstraction of error message

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Thu, 5 Jul 2012 12:24:19 +0000 (14:24 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agofix for pattern matcher in html parser
Michael Peter Christen [Thu, 5 Jul 2012 12:24:03 +0000 (14:24 +0200)]
fix for pattern matcher in html parser

5 years agofix for solr shutdown
Michael Peter Christen [Thu, 5 Jul 2012 12:23:43 +0000 (14:23 +0200)]
fix for solr shutdown

5 years agofix for urls beginning with "//"
Michael Peter Christen [Thu, 5 Jul 2012 12:23:29 +0000 (14:23 +0200)]
fix for urls beginning with "//"

5 years agofix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4430
sixcooler [Thu, 5 Jul 2012 12:06:00 +0000 (14:06 +0200)]
fix for forum.yacy-websuche.de/viewtopic.php?f=5&t=4430

5 years agomade class methods static where possible
Michael Peter Christen [Thu, 5 Jul 2012 10:38:41 +0000 (12:38 +0200)]
made class methods static where possible

5 years ago- removed unnecessary semicolons
Michael Peter Christen [Thu, 5 Jul 2012 09:18:31 +0000 (11:18 +0200)]
- removed unnecessary semicolons
- added default case for switch

5 years agoremoved unaccessible code
Michael Peter Christen [Thu, 5 Jul 2012 09:09:44 +0000 (11:09 +0200)]
removed unaccessible code

5 years agoremoved more unused method parameters
Michael Peter Christen [Thu, 5 Jul 2012 08:44:30 +0000 (10:44 +0200)]
removed more unused method parameters

5 years agoremoved unused ImageReference package
Michael Peter Christen [Thu, 5 Jul 2012 08:24:52 +0000 (10:24 +0200)]
removed unused ImageReference package

5 years agoremoved unused method parameters
Michael Peter Christen [Thu, 5 Jul 2012 08:23:07 +0000 (10:23 +0200)]
removed unused method parameters

5 years agoremoved snippet pattern filter - it was not used
Michael Peter Christen [Thu, 5 Jul 2012 07:21:27 +0000 (09:21 +0200)]
removed snippet pattern filter - it was not used

5 years ago- added @SuppressWarnings to unused servlet method parameters
Michael Peter Christen [Thu, 5 Jul 2012 07:14:04 +0000 (09:14 +0200)]
- added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements

5 years agocleaned unnecessary nested code
Michael Peter Christen [Thu, 5 Jul 2012 06:44:39 +0000 (08:44 +0200)]
cleaned unnecessary nested code

5 years agoreplaced non-generic array with collection
Michael Peter Christen [Wed, 4 Jul 2012 23:02:51 +0000 (01:02 +0200)]
replaced non-generic array with collection

5 years agoadding more principal peers for bootstraping
Michael Peter Christen [Wed, 4 Jul 2012 22:43:41 +0000 (00:43 +0200)]
adding more principal peers for bootstraping

5 years agoMore SentenceReader cleanup
orbiter [Wed, 4 Jul 2012 22:20:58 +0000 (00:20 +0200)]
More SentenceReader cleanup

5 years agoSimplified SentenceReader (no more Reader inside..)
orbiter [Wed, 4 Jul 2012 20:06:20 +0000 (22:06 +0200)]
Simplified SentenceReader (no more Reader inside..)

5 years agoreplaced HashARC with SizeLimited Objects which are less costly
orbiter [Wed, 4 Jul 2012 19:56:25 +0000 (21:56 +0200)]
replaced HashARC with SizeLimited Objects which are less costly

5 years agomore tolerance when creating solar document
orbiter [Wed, 4 Jul 2012 19:15:38 +0000 (21:15 +0200)]
more tolerance when creating solar document

5 years agorefactoring and new usage of SentenceReader: this class appeared as one
orbiter [Wed, 4 Jul 2012 19:15:10 +0000 (21:15 +0200)]
refactoring and new usage of SentenceReader: this class appeared as one
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.

5 years agoautomatically adopt size of word cache to available memory
orbiter [Tue, 3 Jul 2012 16:22:25 +0000 (18:22 +0200)]
automatically adopt size of word cache to available memory

5 years agoclean up parser data
Michael Peter Christen [Tue, 3 Jul 2012 15:20:41 +0000 (17:20 +0200)]
clean up parser data

5 years agoAdding a limit of 1000 links that a parser shall store during indexing.
Michael Peter Christen [Tue, 3 Jul 2012 15:06:20 +0000 (17:06 +0200)]
Adding a limit of 1000 links that a parser shall store during indexing.
A limit was necessary because some web pages have such huge numbers of
links that it can easily cause a OOM just by the number of links.
The quesion if the number of 1000 links is sufficient or too weak must
be answered with the result of testing this feature.

5 years ago- better data structures in secondary search
Michael Peter Christen [Tue, 3 Jul 2012 05:12:20 +0000 (07:12 +0200)]
- better data structures in secondary search
- fixed a big memory leak in secondary search