5 years agoRelease 1.3 Release_1.3
Michael Peter Christen [Thu, 27 Dec 2012 04:11:11 +0000 (05:11 +0100)]
Release 1.3

5 years agoupdated slf4j and log4j
Michael Peter Christen [Thu, 27 Dec 2012 03:37:21 +0000 (04:37 +0100)]
updated slf4j and log4j

5 years agoupdate to pdf parser
Michael Peter Christen [Thu, 27 Dec 2012 03:16:31 +0000 (04:16 +0100)]
update to pdf parser

5 years agouse the search configuration to default the cacheStrategy to the value
Michael Peter Christen [Thu, 27 Dec 2012 02:19:21 +0000 (03:19 +0100)]
use the search configuration to default the cacheStrategy to the value
as given in the search configuration

5 years agouse solr boost configuration to select search fields. At this time it is
Michael Peter Christen [Thu, 27 Dec 2012 02:17:45 +0000 (03:17 +0100)]
use solr boost configuration to select search fields. At this time it is
possible to enter a negative boost value to switch that value off. This
might be different in the future with a better input interface.

5 years agoupdate to search tests (use yacy interface and a bugfix)
Michael Peter Christen [Thu, 27 Dec 2012 02:15:50 +0000 (03:15 +0100)]
update to search tests (use yacy interface and a bugfix)

5 years ago- made image search in interactive search using the ViewImage servlet -
Michael Peter Christen [Wed, 26 Dec 2012 20:25:27 +0000 (21:25 +0100)]
- made image search in interactive search using the ViewImage servlet -
that enables viewing of images for intranet SMB servers.
- added a filter search for protocol, tld and ext again; otherwise p2p
search produces a lot of rubbish

5 years agofix for smb crawl situation (lost too many urls)
Michael Peter Christen [Wed, 26 Dec 2012 18:15:11 +0000 (19:15 +0100)]
fix for smb crawl situation (lost too many urls)

5 years agoSeedUpload url : check to reject localhost url included in saveSeedList (same check...
reger [Mon, 24 Dec 2012 22:29:02 +0000 (23:29 +0100)]
SeedUpload url : check to reject localhost url included in saveSeedList (same check as in / copied from Seed.isProper() ), to prevent identity change on next startup (due to rejected seeduploadurl).

5 years agofix SeedUpload setting propery name for include template file
reger [Mon, 24 Dec 2012 03:13:38 +0000 (04:13 +0100)]
fix SeedUpload setting propery name for include template file

5 years ago- apply fix for localhost handling (from yacy2solr) also to metadata2solr
reger [Sun, 23 Dec 2012 00:30:52 +0000 (01:30 +0100)]
- apply fix for localhost handling (from yacy2solr) also to metadata2solr

5 years agofix: exception if default work files don't exist
reger [Sat, 22 Dec 2012 22:03:39 +0000 (23:03 +0100)]
fix: exception if default work files don't exist

5 years agofix for event starter: delete start time when event is removed
Michael Peter Christen [Sat, 22 Dec 2012 20:16:22 +0000 (21:16 +0100)]
fix for event starter: delete start time when event is removed

5 years agocopy work tables from defaults/data/work if exist there and not in
Michael Peter Christen [Sat, 22 Dec 2012 19:54:05 +0000 (20:54 +0100)]
copy work tables from defaults/data/work if exist there and not in
This can be used to create start-up behavior work scripts in the
api.bheap table

5 years agofix for config basic: do not accept empty peer names
Michael Peter Christen [Sat, 22 Dec 2012 19:52:52 +0000 (20:52 +0100)]
fix for config basic: do not accept empty peer names

5 years agoextended the Scheduler: introduced scheduled events
Michael Peter Christen [Sat, 22 Dec 2012 15:27:14 +0000 (16:27 +0100)]
extended the Scheduler: introduced scheduled events
- an event type (once, regular) can be selected
- for this event type, a fixed time can be selected. This may be either
directly after startup or at one of the full hours at a day (==25
The main point about this feature is the opportunity to start an action
directly after startup. That makes it possible to create YaCy
distributions which, after started at the first time, start to index
parts of the intranet/internet by itself.

5 years agoremoved protocol, tld, ext from the urlmask and created specific
Michael Peter Christen [Wed, 19 Dec 2012 11:45:40 +0000 (12:45 +0100)]
removed protocol, tld, ext from the urlmask and created specific
navigation field for these

5 years agosearch process enhancements
Michael Peter Christen [Wed, 19 Dec 2012 09:41:22 +0000 (10:41 +0100)]
search process enhancements

5 years ago- removed all extension types from extension navigation which are not
Michael Peter Christen [Wed, 19 Dec 2012 01:38:05 +0000 (02:38 +0100)]
- removed all extension types from extension navigation which are not
- automatically show the protocol navigation if there is more than http
and https
- automatically show the extension navigation if there is some media

5 years agousing the author field as solr-native facet. this makes it necessary to
Michael Peter Christen [Wed, 19 Dec 2012 00:56:33 +0000 (01:56 +0100)]
using the author field as solr-native facet. this makes it necessary to
introduce a copy-field for the author field to be copied to a string
field. This field is then used to generate facets. Without this field,
the facet would consist only of the words of the author names, not of
the full author string.

5 years agousing the publisher information for the author field if no author is
Michael Peter Christen [Wed, 19 Dec 2012 00:54:35 +0000 (01:54 +0100)]
using the publisher information for the author field if no author is
given. This applies to cases where only the copyright field in the html
header is filled but not the author field

5 years ago- using a filter query for facet restriction
Michael Peter Christen [Wed, 19 Dec 2012 00:00:57 +0000 (01:00 +0100)]
- using a filter query for facet restriction
- calculating the whole search result in at most two sub-queries from

5 years agousing the solr facets as navigation in yacyinteractive.html instead of
Michael Peter Christen [Tue, 18 Dec 2012 23:59:40 +0000 (00:59 +0100)]
using the solr facets as navigation in yacyinteractive.html instead of
counting locally result types

5 years agoadded another solr field clickdepth_i which reflects the number of
Michael Peter Christen [Tue, 18 Dec 2012 16:20:42 +0000 (17:20 +0100)]
added another solr field clickdepth_i which reflects the number of
clicks which are necessary to get from the portal of a host to a
specific document. At this time, only the start document is flagged with
clickdepth '0', all other with '-1'. To get the actual clickdepth, a
process must use crawled information to collect the actual number of
clicks. This will be added in another/next step.

5 years ago- added a new solr field references_i which stores the number of
Michael Peter Christen [Tue, 18 Dec 2012 13:42:35 +0000 (14:42 +0100)]
- added a new solr field references_i which stores the number of
INCOMING links to the corresponding web page. This information is taken
from the reverse link index (a 'little sister' of the RWI index).
- this field can be of use to enhance the ranking because a web page
with more incoming links can be more more important than others. But
this is not true for typical link pages like menues. Therefore the
number of outgoing links is needed.
- added a new solr attribute 'bf' to solr queries which is a boost
function extension. this field can contain a formula which comuptes the
boost according to given field values. After some experiments the
following forumla is now default:
This takes the number of references and the inbound links. Further
experiments are needed to enhance that forumula.

5 years ago- fix for localhost detection
Michael Peter Christen [Tue, 18 Dec 2012 11:52:20 +0000 (12:52 +0100)]
- fix for localhost detection
- added IPv6 patterns for localhost detection

5 years agoremoved dependency of vocabulary navigation from Jena and it's
Michael Peter Christen [Tue, 18 Dec 2012 01:29:03 +0000 (02:29 +0100)]
removed dependency of vocabulary navigation from Jena and it's
triplestore; the vocabulary search is now done using generic solr fields
which are created on-the-fly during runtime.

5 years agoPerformanceQueues: disable input for hardcoded httpd performance values
reger [Sun, 16 Dec 2012 20:01:13 +0000 (21:01 +0100)]
PerformanceQueues: disable input for hardcoded httpd performance values

5 years agofix: set defaul language to "en"
reger [Sun, 16 Dec 2012 19:53:45 +0000 (20:53 +0100)]
fix: set defaul language to "en"

5 years ago- fixes for host navigation
Michael Peter Christen [Sat, 15 Dec 2012 08:14:49 +0000 (09:14 +0100)]
- fixes for host navigation
- fixes for filetype navigation
- removed unused code

5 years agodistinguishing modified query string and original query string
Michael Peter Christen [Fri, 14 Dec 2012 23:05:46 +0000 (00:05 +0100)]
distinguishing modified query string and original query string

5 years ago- fixed 'delete from subpath' during crawl start which deleted nothing;
Michael Peter Christen [Tue, 11 Dec 2012 12:38:28 +0000 (13:38 +0100)]
- fixed 'delete from subpath' during crawl start which deleted nothing;
now works;
- changed some crawl start html design details

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
orbiter [Mon, 10 Dec 2012 20:18:56 +0000 (21:18 +0100)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agoif maxFileSize < 0 then the file size limit is without limit.
orbiter [Mon, 10 Dec 2012 20:17:45 +0000 (21:17 +0100)]
if maxFileSize < 0 then the file size limit is without limit.

5 years agoquickfix for translated link containig word "browse" in ru & uk, see http://bugs...
reger [Mon, 10 Dec 2012 20:08:04 +0000 (21:08 +0100)]
quickfix for translated link containig word "browse" in ru & uk, see bugs.yacy.net/view.php?id=213

5 years agomore search command tools
orbiter [Mon, 10 Dec 2012 20:01:14 +0000 (21:01 +0100)]
more search command tools

5 years agoyou can now search for '*' to get just ALL entries in the search index
orbiter [Mon, 10 Dec 2012 20:00:30 +0000 (21:00 +0100)]
you can now search for '*' to get just ALL entries in the search index
as result list. This makes sense if you intend to search just by using
the navigation tools to cut the data set into navigation 'slices'.

5 years agoallow larger no-proxy expressions
orbiter [Mon, 10 Dec 2012 19:59:43 +0000 (20:59 +0100)]
allow larger no-proxy expressions

5 years agoyou can now search for '*' to get just ALL entries in the search index
orbiter [Mon, 10 Dec 2012 19:55:11 +0000 (20:55 +0100)]
you can now search for '*' to get just ALL entries in the search index
as result list. This makes sense if you intend to search just by using
the navigation tools to cut the data set into navigation 'slices'.

5 years agore-integrating useForHost method (lost sometime?) to get the noProxy
orbiter [Mon, 10 Dec 2012 19:44:29 +0000 (20:44 +0100)]
re-integrating useForHost method (lost sometime?) to get the noProxy
pattern working again. Without using this method all remote urls
including the localhost had been accessed through the configured proxy

5 years agofix Servlet template on conditional file include with use of conditional template...
reger [Mon, 10 Dec 2012 19:02:35 +0000 (20:02 +0100)]
fix Servlet template on conditional file include with use of conditional template pattern in included template file (example IndexCreateQueues_p.html)
see bug http://bugs.yacy.net/view.php?id=215

5 years ago- fix for bad url conversion in bookmarks when using smb urls
orbiter [Mon, 10 Dec 2012 06:22:42 +0000 (07:22 +0100)]
- fix for bad url conversion in bookmarks when using smb urls
- fix for localhost hosts in solr schema host handling

5 years ago- making blacklist path part case insensitive (solving http://bugs.yacy.net/view...
reger [Sat, 8 Dec 2012 05:34:48 +0000 (06:34 +0100)]
- making blacklist path part case insensitive (solving bugs.yacy.net/view.php?id=171)
- blacklist test adding explicite response text "not blocked" if no blacklist match

5 years agoremove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to...
reger [Fri, 7 Dec 2012 23:19:20 +0000 (00:19 +0100)]
remove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to use ConfigurationSet for other conf files (than solr.keys.default.list).

5 years agointroduced a better place to update the lastacc time value in latency
Michael Peter Christen [Fri, 7 Dec 2012 14:49:23 +0000 (15:49 +0100)]
introduced a better place to update the lastacc time value in latency

5 years agoremoved Latency update after URL selection because that causes
Michael Peter Christen [Fri, 7 Dec 2012 14:35:44 +0000 (15:35 +0100)]
removed Latency update after URL selection because that causes
a completely wrong behaviour when cache fresh cases appear. Makes
re-crawling MUCH faster!

5 years ago- clear the search cache when altering the solr boosts
Michael Peter Christen [Fri, 7 Dec 2012 13:56:34 +0000 (14:56 +0100)]
- clear the search cache when altering the solr boosts
- better positions for submit buttons

5 years agousing a filter query for the site parameter in GSA api
Michael Peter Christen [Fri, 7 Dec 2012 13:54:49 +0000 (14:54 +0100)]
using a filter query for the site parameter in GSA api

5 years agolatency fix: only set last-visit time if access was actually by the
Michael Peter Christen [Fri, 7 Dec 2012 01:00:12 +0000 (02:00 +0100)]
latency fix: only set last-visit time if access was actually by the

5 years agofix for bad xml in gsa result when doing a query with quotes
Michael Peter Christen [Fri, 7 Dec 2012 00:35:02 +0000 (01:35 +0100)]
fix for bad xml in gsa result when doing a query with quotes

5 years agoadded another blacklist-cleaner into balancer
Michael Peter Christen [Fri, 7 Dec 2012 00:27:24 +0000 (01:27 +0100)]
added another blacklist-cleaner into balancer

5 years agofix for wrong display of error urls in HostBrowser
Michael Peter Christen [Thu, 6 Dec 2012 23:31:10 +0000 (00:31 +0100)]
fix for wrong display of error urls in HostBrowser

5 years agofix for waitingtime computation for intranet configuration
Michael Peter Christen [Thu, 6 Dec 2012 16:40:52 +0000 (17:40 +0100)]
fix for waitingtime computation for intranet configuration

5 years ago- check blacklist (again) when taking urls from the crawl stack because
Michael Peter Christen [Wed, 5 Dec 2012 23:12:16 +0000 (00:12 +0100)]
- check blacklist (again) when taking urls from the crawl stack because
the blacklist may get extended during crawling
- removed debug output

5 years agopatch for funny symbols in url paths (like tilde)
Michael Peter Christen [Wed, 5 Dec 2012 21:05:49 +0000 (22:05 +0100)]
patch for funny symbols in url paths (like tilde)

5 years agomore robustness during shutdown
Michael Peter Christen [Wed, 5 Dec 2012 17:20:43 +0000 (18:20 +0100)]
more robustness during shutdown

5 years agoBrute-force attempt to start solr in case of a memory problem.
Michael Peter Christen [Wed, 5 Dec 2012 17:16:06 +0000 (18:16 +0100)]
Brute-force attempt to start solr in case of a memory problem.
I don't actually know if this is correct. It is a desperate try to get
YaCy running on production servers which must get alive even with
strange hacks like this. This is also related to a forum posting in

5 years agoupdate to Solr Boost handling
Michael Peter Christen [Wed, 5 Dec 2012 11:26:42 +0000 (12:26 +0100)]
update to Solr Boost handling

5 years agoAdded a new servlet to configure the solr ranking using field boosts
Michael Peter Christen [Mon, 3 Dec 2012 16:01:19 +0000 (17:01 +0100)]
Added a new servlet to configure the solr ranking using field boosts

5 years agorenamed Ranking_p.html to RankingRWI_p.html
Michael Peter Christen [Sun, 2 Dec 2012 23:01:41 +0000 (00:01 +0100)]
renamed Ranking_p.html to RankingRWI_p.html
because there will be another Ranking servlet as well at next

5 years agoenhanced exists()-method for solr; should reduce a lot of IO during DHT
Michael Peter Christen [Sun, 2 Dec 2012 16:29:37 +0000 (17:29 +0100)]
enhanced exists()-method for solr; should reduce a lot of IO during DHT
target selection

5 years agoadded a Boost class which stores solr query boost values. The class can
Michael Peter Christen [Sun, 2 Dec 2012 15:54:29 +0000 (16:54 +0100)]
added a Boost class which stores solr query boost values. The class can
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.

5 years agoadded number of characters in url to default index to be able to use
Michael Peter Christen [Sun, 2 Dec 2012 15:53:02 +0000 (16:53 +0100)]
added number of characters in url to default index to be able to use
this field for ranking

5 years agoadded more logging to get info which url causes performance problems
Michael Peter Christen [Sun, 2 Dec 2012 15:52:12 +0000 (16:52 +0100)]
added more logging to get info which url causes performance problems

5 years agofix: prevent regex pattern compile error for blacklist import for path '*' (extend...
reger [Sat, 1 Dec 2012 21:41:21 +0000 (22:41 +0100)]
fix: prevent regex pattern compile error for blacklist import for path '*' (extend it to '.*')

5 years agofix: respect config setting of "show Nav Top-Menu" in HostBrowser.html for public...
reger [Sat, 1 Dec 2012 00:14:29 +0000 (01:14 +0100)]
fix: respect config setting of "show Nav Top-Menu" in HostBrowser.html for public users (as hostbrowser is now available in search results)

5 years agoprevent Solr "version conflict" on update by set Solr "_version_" field to 0 (=no...
reger [Tue, 27 Nov 2012 23:09:53 +0000 (00:09 +0100)]
prevent Solr "version conflict" on update by set Solr "_version_" field to 0 (=no version check)

5 years agoimprovements in GSA result writer
Michael Peter Christen [Mon, 26 Nov 2012 14:18:51 +0000 (15:18 +0100)]
improvements in GSA result writer

5 years agoreplaced more split and replaceAll missing pattern pre-compilation with
Michael Peter Christen [Mon, 26 Nov 2012 12:40:53 +0000 (13:40 +0100)]
replaced more split and replaceAll missing pattern pre-compilation with
pre-compiled pattern

5 years agousing more pre-compile pattern for split methods
Michael Peter Christen [Mon, 26 Nov 2012 12:11:55 +0000 (13:11 +0100)]
using more pre-compile pattern for split methods

5 years agoenhanced search result processing behavior
Michael Peter Christen [Mon, 26 Nov 2012 11:24:35 +0000 (12:24 +0100)]
enhanced search result processing behavior
- query less at one time; query more often
- in between the small queries, evaluate results
- remove fields from search results which are not needed

5 years agoMerge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
Michael Peter Christen [Sun, 25 Nov 2012 23:14:57 +0000 (00:14 +0100)]
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1

5 years agofix: display and calculate authors and namespace search navigator if configured ...
reger [Sun, 25 Nov 2012 21:49:26 +0000 (22:49 +0100)]
fix: display and calculate authors and namespace search navigator if configured (otherwise skip overhead)
(leave hosts, topics and  not in ConfigPortal included filetype,  protocoll navigator untouched)

5 years agoadded debug code to crawler monitor
Michael Peter Christen [Sun, 25 Nov 2012 14:43:42 +0000 (15:43 +0100)]
added debug code to crawler monitor

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Sun, 25 Nov 2012 13:41:49 +0000 (14:41 +0100)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

5 years agoadded link to
orbiter [Sun, 25 Nov 2012 11:20:41 +0000 (12:20 +0100)]
added link to
to the /RegexTest.html servlet

5 years agostart the local search only if this peer is doing a remote search or
orbiter [Sun, 25 Nov 2012 10:58:57 +0000 (11:58 +0100)]
start the local search only if this peer is doing a remote search or
when it is doing a local search and the peer is old

5 years ago- removed multi-add of documents (no used)
Michael Peter Christen [Sun, 25 Nov 2012 00:34:39 +0000 (01:34 +0100)]
- removed multi-add of documents (no used)
- inserted specialized code for size request

5 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Sat, 24 Nov 2012 21:31:46 +0000 (22:31 +0100)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git


5 years ago- added a field cache for solr queries which call only for a single
Michael Peter Christen [Sat, 24 Nov 2012 21:30:05 +0000 (22:30 +0100)]
- added a field cache for solr queries which call only for a single
- fixed a version conflict exception within a solr add request

5 years agofixes for filesystem indexing
orbiter [Sat, 24 Nov 2012 09:27:29 +0000 (10:27 +0100)]
fixes for filesystem indexing

5 years agointroduced more structure in HostBrowser, table view, better counting,
Michael Peter Christen [Fri, 23 Nov 2012 13:09:48 +0000 (14:09 +0100)]
introduced more structure in HostBrowser, table view, better counting,
distinguishing of error cases (fail/excluded)

5 years agoadded a new fail type attribute for the index to distinguish two
Michael Peter Christen [Fri, 23 Nov 2012 13:00:30 +0000 (14:00 +0100)]
added a new fail type attribute for the index to distinguish two
separate fail types: network fail and forced exclusion (i.e. by robots
or forwarding rules).

5 years ago- added another enumeration method in kelondro data structure to get a
Michael Peter Christen [Fri, 23 Nov 2012 12:58:39 +0000 (13:58 +0100)]
- added another enumeration method in kelondro data structure to get a
more random access to data for the balancer
- added random access inside the balancer

5 years agoremoved overhead by preventing generation of full search results when
Michael Peter Christen [Fri, 23 Nov 2012 00:35:28 +0000 (01:35 +0100)]
removed overhead by preventing generation of full search results when
only the url is requested

5 years ago- using edismax in gsa interface
Michael Peter Christen [Thu, 22 Nov 2012 12:03:33 +0000 (13:03 +0100)]
- using edismax in gsa interface
- generating less field data for gsa search results
- using a boost query in gsa interface to move double content to the end
of the result list

5 years agoadded a feature to find similarities in documents.
Michael Peter Christen [Wed, 21 Nov 2012 17:46:49 +0000 (18:46 +0100)]
added a feature to find similarities in documents.
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.

5 years ago- added field options to all solr queries. This can be used to restrict
Michael Peter Christen [Mon, 19 Nov 2012 16:24:34 +0000 (17:24 +0100)]
- added field options to all solr queries. This can be used to restrict
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned

5 years agoMerge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'
Michael Peter Christen [Sun, 18 Nov 2012 21:11:04 +0000 (22:11 +0100)]
Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'

5 years agoMerge remote-tracking branch 'reger/master'
Michael Peter Christen [Sun, 18 Nov 2012 21:04:34 +0000 (22:04 +0100)]
Merge remote-tracking branch 'reger/master'

5 years agoMerge remote-tracking branch 'regerdev/master'
Michael Peter Christen [Sun, 18 Nov 2012 21:04:11 +0000 (22:04 +0100)]
Merge remote-tracking branch 'regerdev/master'

5 years agoFINALLY YaCy can now search for full strings using double- or
Michael Peter Christen [Sun, 18 Nov 2012 15:03:34 +0000 (16:03 +0100)]
FINALLY YaCy can now search for full strings using double- or
singlequoted strings in the search query line!!!

5 years agoredesign of the QueryParams class: introduced QueryGoal which holds the
orbiter [Sun, 18 Nov 2012 00:22:41 +0000 (01:22 +0100)]
redesign of the QueryParams class: introduced QueryGoal which holds the
query string parser. This shall be used to create a proper full-string
matching which is handled then by QueryGoal.

5 years agocontent control: use up-to-date definitions
cominch [Tue, 13 Nov 2012 16:32:19 +0000 (17:32 +0100)]
content control: use up-to-date definitions

5 years agoadded deletion of hosts during crawl start if deleteold option was given
Michael Peter Christen [Tue, 13 Nov 2012 15:54:28 +0000 (16:54 +0100)]
added deletion of hosts during crawl start if deleteold option was given

5 years agobecause we have the inurl:<term> - searchmodifier, we don't actually
Michael Peter Christen [Tue, 13 Nov 2012 10:45:56 +0000 (11:45 +0100)]
because we have the inurl:<term> - searchmodifier, we don't actually
need regular expressions as search attributes. They had now been removed
from the advanced search page while they are still created internally.
The filter is then expressed against solr as regular expression filter
query. If the expression points out a selection of an specific protocol,
host or filetype this is then translated into a facetted query.

5 years ago- redesign of crawl start servlet
orbiter [Tue, 13 Nov 2012 09:54:21 +0000 (10:54 +0100)]
- redesign of crawl start servlet
- for domain-limited crawls, the domain is deleted now by default before
the crawl is started

5 years ago- removed scheduled crawling options in crawl start because it is
orbiter [Mon, 12 Nov 2012 10:19:39 +0000 (11:19 +0100)]
- removed scheduled crawling options in crawl start because it is
superfluous there; it can be changed in the scheduler servlet. It's also
confusing in the presence of the delete-option, which will be
implemented next.
- removed unused crawl start servlet
- some refactoring to make the time parser reusable

5 years agoSMW Import: replaced JSON import routines with stable ones
cominch [Mon, 12 Nov 2012 10:17:50 +0000 (11:17 +0100)]
SMW Import: replaced JSON import routines with stable ones

5 years agofix: remove fixed individual testing IP ( = server4you.de) from default...
reger [Sun, 11 Nov 2012 20:19:18 +0000 (21:19 +0100)]
fix: remove fixed individual testing IP ( = server4you.de) from default/yacy.network.freeworld.unit