yacy:rc1.git
6 years agorelease 1.1 Release_1.1
orbiter [Fri, 24 Aug 2012 21:59:10 +0000 (23:59 +0200)]
release 1.1

6 years agoadded a direct access to solr search api to enhance the visibility if
orbiter [Fri, 24 Aug 2012 21:04:19 +0000 (23:04 +0200)]
added a direct access to solr search api to enhance the visibility if
the embedded solr

6 years agosmall fixes
orbiter [Fri, 24 Aug 2012 19:44:22 +0000 (21:44 +0200)]
small fixes

6 years agoMerge commit 'c2341a175fdd755a34965ff63c7ea437b380352d'
orbiter [Fri, 24 Aug 2012 16:24:24 +0000 (18:24 +0200)]
Merge commit 'c2341a175fdd755a34965ff63c7ea437b380352d'

6 years agoFixed a bug that prevented Yacy from indexing files with non ASCII filenames in FTP...
David Rubio [Fri, 24 Aug 2012 15:45:14 +0000 (17:45 +0200)]
Fixed a bug that prevented Yacy from indexing files with non ASCII filenames in FTP servers.

Previously Yacy could read file listings in UTF-8, but couldn't send commands to the FTP server in UTF-8 (the second byte of every multi-byte character was ignored), which caused a lot of errors on the server side.
Now it handles UTF-8 correctly.

6 years agofixed concurrent query
orbiter [Fri, 24 Aug 2012 12:15:40 +0000 (14:15 +0200)]
fixed concurrent query

6 years agofixed generation of ontologies from index enumerations
orbiter [Fri, 24 Aug 2012 12:13:42 +0000 (14:13 +0200)]
fixed generation of ontologies from index enumerations

6 years agoomit xml parsing when using the embedded solr server
orbiter [Fri, 24 Aug 2012 10:18:30 +0000 (12:18 +0200)]
omit xml parsing when using the embedded solr server

6 years agoadded the
orbiter [Thu, 23 Aug 2012 09:53:54 +0000 (11:53 +0200)]
added the
QueryResponse query(SolrParams params)
method to the SolrServerConnector which is necessary to use facets in
solr search.

6 years agoredesign of YaCySchema and SolrDoc handling
orbiter [Thu, 23 Aug 2012 07:51:45 +0000 (09:51 +0200)]
redesign of YaCySchema and SolrDoc handling

6 years agorefctoring
orbiter [Thu, 23 Aug 2012 07:30:11 +0000 (09:30 +0200)]
refctoring

6 years agolog queries anonymous from gsa+solr requests
Michael Peter Christen [Wed, 22 Aug 2012 21:50:40 +0000 (23:50 +0200)]
log queries anonymous from gsa+solr requests

6 years agoadded snippet computation to solr/rss and gsa result writer
Michael Peter Christen [Wed, 22 Aug 2012 15:37:34 +0000 (17:37 +0200)]
added snippet computation to solr/rss and gsa result writer

6 years ago- reduced memory usage in index transmission using a transformation of
Michael Peter Christen [Wed, 22 Aug 2012 14:30:33 +0000 (16:30 +0200)]
- reduced memory usage in index transmission using a transformation of
Node to Row objects
- removed peerDeparture in solr remote search in case that peer does not
answer (this may be normal because it is allowed to switch this off)

6 years agore-activated audio and video search because they obviously work (!)
Michael Peter Christen [Tue, 21 Aug 2012 23:56:13 +0000 (01:56 +0200)]
re-activated audio and video search because they obviously work (!)

6 years agofix for NPE during host navigation computation
Michael Peter Christen [Tue, 21 Aug 2012 23:55:39 +0000 (01:55 +0200)]
fix for NPE during host navigation computation

6 years agofixed GSA format
Michael Peter Christen [Tue, 21 Aug 2012 22:48:37 +0000 (00:48 +0200)]
fixed GSA format

6 years agocorrected solr query syntax
Michael Peter Christen [Tue, 21 Aug 2012 22:48:03 +0000 (00:48 +0200)]
corrected solr query syntax

6 years ago- enhanced caching after search queries to solr
Michael Peter Christen [Tue, 21 Aug 2012 22:31:14 +0000 (00:31 +0200)]
- enhanced caching after search queries to solr
- reduced caching after short memory

6 years agosorted the solr schema into mandatory and optional fields; reduced
orbiter [Tue, 21 Aug 2012 21:52:56 +0000 (23:52 +0200)]
sorted the solr schema into mandatory and optional fields; reduced
number of used field to reduce solr index size

6 years agofix from gaston in
orbiter [Tue, 21 Aug 2012 19:03:26 +0000 (21:03 +0200)]
fix from gaston in
http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909

6 years agoremoved unused classes
orbiter [Tue, 21 Aug 2012 16:18:30 +0000 (18:18 +0200)]
removed unused classes

6 years agogsa bugfix for date parser
Michael Peter Christen [Tue, 21 Aug 2012 00:39:28 +0000 (02:39 +0200)]
gsa bugfix for date parser

6 years agofixes for gsa result format
Michael Peter Christen [Mon, 20 Aug 2012 23:57:46 +0000 (01:57 +0200)]
fixes for gsa result format

6 years agoadded authorization-based maximum results limitation to solr and gsa
Michael Peter Christen [Mon, 20 Aug 2012 15:10:48 +0000 (17:10 +0200)]
added authorization-based maximum results limitation to solr and gsa
search

6 years agoadded gzip encoding to solr2sor http interface, client side (server
Michael Peter Christen [Mon, 20 Aug 2012 14:53:21 +0000 (16:53 +0200)]
added gzip encoding to solr2sor http interface, client side (server
already works)

6 years agofixed double-check
Michael Peter Christen [Mon, 20 Aug 2012 12:16:37 +0000 (14:16 +0200)]
fixed double-check

6 years agoadded a tooltip for search navigation to mention that search pages can
Michael Peter Christen [Mon, 20 Aug 2012 11:02:29 +0000 (13:02 +0200)]
added a tooltip for search navigation to mention that search pages can
be navigated using the TAB key

6 years agogsa format update
Michael Peter Christen [Mon, 20 Aug 2012 10:50:51 +0000 (12:50 +0200)]
gsa format update

6 years agobugfix for remote search when search is done to solr
Michael Peter Christen [Mon, 20 Aug 2012 10:21:36 +0000 (12:21 +0200)]
bugfix for remote search when search is done to solr

6 years agoadded remote search to solr on YaCy peers!
Michael Peter Christen [Mon, 20 Aug 2012 10:16:11 +0000 (12:16 +0200)]
added remote search to solr on YaCy peers!
- when doing a remote search, node peers are selected for solr queries
- the solr query is done concurrently to the standard YaCy rwi search
- the solr search result is feeded into the same data structure that
prepares the rwi search result
- the same remote seach that is done to several outside peers is done to
the local solr index
- the search process works now also without any 'old' RWI data using
solr

6 years agomore abstraction and less parameter overhead for remote search
Michael Peter Christen [Sun, 19 Aug 2012 23:29:15 +0000 (01:29 +0200)]
more abstraction and less parameter overhead for remote search

6 years agocode simplifications
Michael Peter Christen [Sun, 19 Aug 2012 11:17:03 +0000 (13:17 +0200)]
code simplifications

6 years agoremoved strange assert statements and simplified code in metadata
Michael Peter Christen [Sun, 19 Aug 2012 06:44:39 +0000 (08:44 +0200)]
removed strange assert statements and simplified code in metadata
transformation

6 years agofix for http://bugs.yacy.net/view.php?id=206
Michael Peter Christen [Sun, 19 Aug 2012 06:43:56 +0000 (08:43 +0200)]
fix for http://bugs.yacy.net/view.php?id=206

6 years agorefactoring in remote search and stub for remote node peer selection
orbiter [Sat, 18 Aug 2012 21:59:25 +0000 (23:59 +0200)]
refactoring in remote search and stub for remote node peer selection

6 years ago- get nice text_t values from metadata conversions that are stored into
orbiter [Sat, 18 Aug 2012 17:36:21 +0000 (19:36 +0200)]
- get nice text_t values from metadata conversions that are stored into
solr as fulltext search index.
- added slow migration from old metadata to solr index entries: each
entry from the old metadata is removed from that data structure and
written into solr.

6 years agoreduced sleep times
orbiter [Sat, 18 Aug 2012 15:48:20 +0000 (17:48 +0200)]
reduced sleep times

6 years agoadded ramaining iteration methods for solr in fulltext class
orbiter [Sat, 18 Aug 2012 13:39:14 +0000 (15:39 +0200)]
added ramaining iteration methods for solr in fulltext class

6 years agohack to removed StringBuilder overhead in query construction
orbiter [Sat, 18 Aug 2012 12:22:00 +0000 (14:22 +0200)]
hack to removed StringBuilder overhead in query construction

6 years agoreduced solr cache sizes to check if that solves memory problems a bit
orbiter [Sat, 18 Aug 2012 11:45:37 +0000 (13:45 +0200)]
reduced solr cache sizes to check if that solves memory problems a bit

6 years agoexplicit double-check in transferURL
orbiter [Sat, 18 Aug 2012 11:18:51 +0000 (13:18 +0200)]
explicit double-check in transferURL

6 years agofixes for putDocument and putMetadata
orbiter [Sat, 18 Aug 2012 11:05:27 +0000 (13:05 +0200)]
fixes for putDocument and putMetadata

6 years agoreverted bf55f6917652909f8eb465ccefd1f7ccb4c4d364
orbiter [Sat, 18 Aug 2012 08:28:40 +0000 (10:28 +0200)]
reverted bf55f6917652909f8eb465ccefd1f7ccb4c4d364
to have a fall-back option in case that memory problems as reported in
http://forum.yacy-websuche.de/viewtopic.php?p=26901#p26901
for full-solr installation are too strong and we have to work with an
'small memory footprint' peer system.

6 years agoadded concurrent iterator methods to the solr connectors
Michael Peter Christen [Fri, 17 Aug 2012 16:22:56 +0000 (18:22 +0200)]
added concurrent iterator methods to the solr connectors

6 years agorefactoring
Michael Peter Christen [Fri, 17 Aug 2012 15:28:27 +0000 (17:28 +0200)]
refactoring

6 years agobetter check for bad urls in url transmission
Michael Peter Christen [Fri, 17 Aug 2012 15:17:00 +0000 (17:17 +0200)]
better check for bad urls in url transmission

6 years agoadded deleteByQuery to solr connectors
Michael Peter Christen [Fri, 17 Aug 2012 15:05:46 +0000 (17:05 +0200)]
added deleteByQuery to solr connectors

6 years agorefactoring
Michael Peter Christen [Fri, 17 Aug 2012 13:52:33 +0000 (15:52 +0200)]
refactoring

6 years agoremoved write methods to old metadata file type; all metadata now goes
Michael Peter Christen [Fri, 17 Aug 2012 13:46:26 +0000 (15:46 +0200)]
removed write methods to old metadata file type; all metadata now goes
to solr

6 years agorefactoring
Michael Peter Christen [Fri, 17 Aug 2012 13:33:02 +0000 (15:33 +0200)]
refactoring

6 years agoupgrade to solr 3.6.1
Michael Peter Christen [Fri, 17 Aug 2012 13:11:21 +0000 (15:11 +0200)]
upgrade to solr 3.6.1

6 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Fri, 17 Aug 2012 12:45:18 +0000 (14:45 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

6 years agocode simplification
Michael Peter Christen [Fri, 17 Aug 2012 12:43:32 +0000 (14:43 +0200)]
code simplification

6 years agobugfix for solr connector, possibly a cause for
Michael Peter Christen [Fri, 17 Aug 2012 12:34:31 +0000 (14:34 +0200)]
bugfix for solr connector, possibly a cause for
http://forum.yacy-websuche.de/viewtopic.php?p=26893#p26893

6 years agoenhanced snippet fetch - removed a bug that caused documents to be
Michael Peter Christen [Fri, 17 Aug 2012 12:22:07 +0000 (14:22 +0200)]
enhanced snippet fetch - removed a bug that caused documents to be
parsed even if a solr text was available

6 years agolocal robots.txt: disallow external crawlers to follow the URL proxy
cominch [Fri, 17 Aug 2012 09:47:39 +0000 (11:47 +0200)]
local robots.txt: disallow external crawlers to follow the URL proxy

6 years ago- refactoring (load -> getMetadata)
Michael Peter Christen [Thu, 16 Aug 2012 23:34:38 +0000 (01:34 +0200)]
- refactoring (load -> getMetadata)
- added getDocument to retrieve Solr documents which shall replace
getMetadata

6 years agousing the solr search index to concurrently search within solr and the
Michael Peter Christen [Thu, 16 Aug 2012 23:21:56 +0000 (01:21 +0200)]
using the solr search index to concurrently search within solr and the
rwis during local search requests.

6 years agoadded clear-text search words in query params
Michael Peter Christen [Thu, 16 Aug 2012 21:05:37 +0000 (23:05 +0200)]
added clear-text search words in query params

6 years ago- added a content-encoding: gzip to streamed http server responses
Michael Peter Christen [Thu, 16 Aug 2012 20:35:19 +0000 (22:35 +0200)]
- added a content-encoding: gzip to streamed http server responses
- finish and close streamed http responses immediately
- this applies only to the solr interface which should be much faster
now!

6 years agoFOR THE BRAVE.. this is a forced migration to solr which is now ready
Michael Peter Christen [Thu, 16 Aug 2012 16:17:47 +0000 (18:17 +0200)]
FOR THE BRAVE.. this is a forced migration to solr which is now ready
for production as a replacement of the metadata-db.
This intermediate release 1.041 will switch on the previously optional
solr index and the old metadata-db will still work as it did before.
Solr+metadata are accessed in mixed mode, no migration is done yet.
If this causes not a catastrophe until the end of the weekend, we will
do a YaCy 1.1 main release containing this as default.

6 years agodoctype2mime fix, influences metadata conversion between old metadata
Michael Peter Christen [Thu, 16 Aug 2012 15:49:35 +0000 (17:49 +0200)]
doctype2mime fix, influences metadata conversion between old metadata
and solr

6 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Thu, 16 Aug 2012 15:45:26 +0000 (17:45 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

6 years agochanged local robots.txt to prevent external crawlers to submit random
cominch [Thu, 16 Aug 2012 15:38:10 +0000 (17:38 +0200)]
changed local robots.txt to prevent external crawlers to submit random
search queries

6 years agomore attempts to clean the index (cleaning is faster then)
Michael Peter Christen [Thu, 16 Aug 2012 15:24:25 +0000 (17:24 +0200)]
more attempts to clean the index (cleaning is faster then)

6 years agofixed some peer-ping connection details
Michael Peter Christen [Thu, 16 Aug 2012 15:11:54 +0000 (17:11 +0200)]
fixed some peer-ping connection details
- larger time-out
- removed too old seedlist
- fixed a bug in connection test

6 years agoget the peer location more quickly
Michael Peter Christen [Thu, 16 Aug 2012 14:28:57 +0000 (16:28 +0200)]
get the peer location more quickly

6 years agofix for Index out of bounds exception in Network servlet
orbiter [Thu, 16 Aug 2012 05:47:52 +0000 (07:47 +0200)]
fix for Index out of bounds exception in Network servlet

6 years agoaddon to e74d66e28cce7b9674ad5011e5db7970ccaf5635
orbiter [Thu, 16 Aug 2012 05:28:38 +0000 (07:28 +0200)]
addon to e74d66e28cce7b9674ad5011e5db7970ccaf5635
(removed htmlparser.jar): for Mac App

6 years agofix xss bug #204
Lotus [Wed, 15 Aug 2012 12:23:21 +0000 (14:23 +0200)]
fix xss bug #204

6 years agoreplaced yacy xml encoding by solr xml encoding
Michael Peter Christen [Tue, 14 Aug 2012 11:29:11 +0000 (13:29 +0200)]
replaced yacy xml encoding by solr xml encoding

6 years agoenhanced GSA and RSS output format: corrected date, added some missing
Michael Peter Christen [Tue, 14 Aug 2012 11:19:29 +0000 (13:19 +0200)]
enhanced GSA and RSS output format: corrected date, added some missing
fields, added xml encoding for utf8

6 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Tue, 14 Aug 2012 10:40:44 +0000 (12:40 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

6 years agoadded a very rudimentary, incomplete, non-verified GSA response writer
Michael Peter Christen [Tue, 14 Aug 2012 10:40:26 +0000 (12:40 +0200)]
added a very rudimentary, incomplete, non-verified GSA response writer
for solr. Try this:
http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10

6 years ago- added xslt support for solr result formats.
Michael Peter Christen [Tue, 14 Aug 2012 09:12:50 +0000 (11:12 +0200)]
- added xslt support for solr result formats.
try i.e.
http://localhost:8090/solr/select?q=*:*&start=0&rows=10&wt=xslt&tr=json.xsl
- added servlet-side mime-type configuration for streamed servlets. this
is used for the result formatters in solr result formats

6 years agoaugmented browsing: remove htmlparser library
cominch [Tue, 14 Aug 2012 08:09:46 +0000 (10:09 +0200)]
augmented browsing: remove htmlparser library

6 years agoaugmented browsing: replace htmlparser by jsoup, which is more stable
cominch [Tue, 14 Aug 2012 08:06:12 +0000 (10:06 +0200)]
augmented browsing: replace htmlparser by jsoup, which is more stable
and reliable

6 years agoadded a possibility to define a custom network definition URL for remote
cominch [Mon, 13 Aug 2012 14:57:53 +0000 (16:57 +0200)]
added a possibility to define a custom network definition URL for remote
management

6 years agoMerge remote-tracking branch 'original yacy/master'
cominch [Mon, 13 Aug 2012 14:48:14 +0000 (16:48 +0200)]
Merge remote-tracking branch 'original yacy/master'

6 years agoups
Michael Peter Christen [Mon, 13 Aug 2012 12:01:45 +0000 (14:01 +0200)]
ups

6 years ago- renamed DoubleSolrConnector to MirrorSolrConnector and added a
Michael Peter Christen [Mon, 13 Aug 2012 11:32:32 +0000 (13:32 +0200)]
- renamed DoubleSolrConnector to MirrorSolrConnector and added a
hit/miss/document cache to the MirrorSolrConnector.
- more abstraction to SolrDocument in Connector interface
- bugfixes in Solr field reader

6 years agoanother fix to the Solr metadata reading process and to the shutdown
Michael Peter Christen [Mon, 13 Aug 2012 09:13:53 +0000 (11:13 +0200)]
another fix to the Solr metadata reading process and to the shutdown
process

6 years ago- added coordinate storage in solr schema
Michael Peter Christen [Mon, 13 Aug 2012 08:40:04 +0000 (10:40 +0200)]
- added coordinate storage in solr schema
- fixed shutdown process
- fixed some solr-to-metadata reading
- added a large number of metadata attributes in ViewFile.html

6 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Fri, 10 Aug 2012 23:21:18 +0000 (01:21 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

6 years agoremoved unused class
Michael Peter Christen [Fri, 10 Aug 2012 23:05:40 +0000 (01:05 +0200)]
removed unused class

6 years agotried to prevent calls to bad-hack getSize() method and reduced overhead
orbiter [Fri, 10 Aug 2012 16:10:25 +0000 (18:10 +0200)]
tried to prevent calls to bad-hack getSize() method and reduced overhead
of that method a bit.

6 years agopatch from hint in
orbiter [Fri, 10 Aug 2012 13:44:37 +0000 (15:44 +0200)]
patch from hint in
http://forum.yacy-websuche.de/viewtopic.php?p=26858#p26858
from gaston

6 years agochanged behaviour of metadata storage: in case that any solr is
orbiter [Fri, 10 Aug 2012 13:39:10 +0000 (15:39 +0200)]
changed behaviour of metadata storage: in case that any solr is
attached, the metadata is not written to the metadata-db, even if it is
enabled but instead to solr. This prevents that metadata is written in
two store systems at the same time. It is also the next step to migrate
the current metadata-db to solr.

6 years agoremoved unused classes
orbiter [Fri, 10 Aug 2012 12:47:44 +0000 (14:47 +0200)]
removed unused classes

6 years ago- Implemented and integrated the URIMetadataNode object which is a
Michael Peter Christen [Fri, 10 Aug 2012 11:26:51 +0000 (13:26 +0200)]
- Implemented and integrated the URIMetadataNode object which is a
metadata representation from the solr index. This shall replace metadata
from the built-in database in the future.
- added the Solr-driven metadata into the search index of YaCy which
makes it now possible to run YaCy without the old metadata index. This
is a major stept forward to a full migration to Solr.

6 years agomore abstraction of the YaCySchema -> Opensearch matching process
Michael Peter Christen [Fri, 10 Aug 2012 07:48:15 +0000 (09:48 +0200)]
more abstraction of the YaCySchema -> Opensearch matching process

6 years agoMerge branch 'master' of git://gitorious.org/~chalker/yacy/chalkers-yacy-rc1
Michael Peter Christen [Fri, 10 Aug 2012 07:47:15 +0000 (09:47 +0200)]
Merge branch 'master' of git://gitorious.org/~chalker/yacy/chalkers-yacy-rc1

6 years agomore abstraction for solr query params parsing
Michael Peter Christen [Fri, 10 Aug 2012 05:58:45 +0000 (07:58 +0200)]
more abstraction for solr query params parsing

6 years agoset the title every time, it is possible that it has changed
Michael Peter Christen [Fri, 10 Aug 2012 05:51:57 +0000 (07:51 +0200)]
set the title every time, it is possible that it has changed

6 years agobetter abstraction for result writers using controlled vocabularies and
Michael Peter Christen [Fri, 10 Aug 2012 05:45:43 +0000 (07:45 +0200)]
better abstraction for result writers using controlled vocabularies and
URIRefs

6 years agorefactoring
Michael Peter Christen [Fri, 10 Aug 2012 04:47:13 +0000 (06:47 +0200)]
refactoring

6 years agoadded two response writer for embedded solr interface:
Michael Peter Christen [Thu, 9 Aug 2012 16:06:48 +0000 (18:06 +0200)]
added two response writer for embedded solr interface:
a rss/opensearch writer and an enhanced solr xml writer.
The enhanced solr writer has less configuration overhead than the
original writer and should by slightly faster. The rss/opensearch writer
is at this time slightly incomplete compared with the already existing
rss search result form YaCy and also snippets are missing at this time.
To test the new interface, open for example:
http://localhost:8090/solr/select?wt=rss&q=olympia
The wt-code for the new result writers are=
wt=rss for opensearch
wt=exml for the enhanced solr xml writer.
Additionally, the SRU search parameters had been added to the solr
interface which can now also be used for a normal solr/xml search.

6 years agoFix an error in Russian translation: "can not" => "can".
Сковорода Никита Андреевич [Wed, 8 Aug 2012 07:35:45 +0000 (11:35 +0400)]
Fix an error in Russian translation: "can not" => "can".

6 years agoreplaced the multivalue generic string field name suffix _ss by _txt
Michael Peter Christen [Mon, 6 Aug 2012 15:58:09 +0000 (17:58 +0200)]
replaced the multivalue generic string field name suffix _ss by _txt
because _ss is not part of the standard solr example schema.