yacy:rc1.git
4 years agoRelease 1.72 Release_1.72
Michael Peter Christen [Tue, 6 May 2014 16:54:56 +0000 (18:54 +0200)]
Release 1.72

4 years agoenhanced snippets: remove lines which are identical to the title and
Michael Peter Christen [Tue, 6 May 2014 14:48:50 +0000 (16:48 +0200)]
enhanced snippets: remove lines which are identical to the title and
choose longer versions if possible. Prefer the description part.

4 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Tue, 6 May 2014 12:51:57 +0000 (14:51 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

4 years agofix for navigation steering / p2p mode
orbiter [Tue, 6 May 2014 03:58:51 +0000 (05:58 +0200)]
fix for navigation steering / p2p mode
see also:
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5198&p=29958#p29958

4 years agoMerge branch 'master' of git@gitorious.org:yacy/rc1.git
orbiter [Tue, 6 May 2014 03:38:38 +0000 (05:38 +0200)]
Merge branch 'master' of git@gitorious.org:yacy/rc1.git

4 years agoImproved Blacklist API:
Marc Nause [Mon, 5 May 2014 21:16:01 +0000 (23:16 +0200)]
Improved Blacklist API:

*) added JSON support
*) fixed Exception in case of missing parameters
*) renamed parameter for items in "add entry" and "delete entry" from
"entry" to "item" to match term in XML

4 years agoo not check for segments-count on optimize:
sixcooler [Mon, 5 May 2014 11:24:41 +0000 (13:24 +0200)]
o not check for segments-count on optimize:
this is also done in Solr and our getSegmentsCount() does not return
up-to-date values

4 years agocontent of surrogates/out never accessed (remove)
reger [Sun, 4 May 2014 07:29:07 +0000 (09:29 +0200)]
content of surrogates/out never accessed (remove)
After import the conent is never accessed but may take up a lot of disk space,
also the getLoadedOAIServer (which lists the files in surrogate out) is not used.
Making the surrogate.out obsolete. Removed keeping of xmls after import.

4 years agoMerge origin/master
reger [Sat, 3 May 2014 19:57:06 +0000 (21:57 +0200)]
Merge origin/master

4 years agofix input-group layout on index.html
reger [Sat, 3 May 2014 19:55:10 +0000 (21:55 +0200)]
fix input-group layout on index.html
see bug http://mantis.tokeek.de/view.php?id=391

4 years agoremove tables from tabletracker on close to avoid lots of dead entrys in
sixcooler [Fri, 2 May 2014 20:55:47 +0000 (22:55 +0200)]
remove tables from tabletracker on close to avoid lots of dead entrys in
/PerformanceMemory_p.html

4 years agofix NPE on continuing crawls after YaCy restart
reger [Fri, 2 May 2014 17:32:09 +0000 (19:32 +0200)]
fix NPE on continuing crawls after YaCy restart
(Agent is then nulll)

4 years agoMerge branch 'master' of git@gitorious.org:yacy/rc1.git
orbiter [Fri, 2 May 2014 15:28:20 +0000 (17:28 +0200)]
Merge branch 'master' of git@gitorious.org:yacy/rc1.git

4 years agoKey for parameter "blacklist name" is "list" in all servlets now.
Marc Nause [Fri, 2 May 2014 12:18:52 +0000 (14:18 +0200)]
Key for parameter "blacklist name" is "list" in all servlets now.

4 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Fri, 2 May 2014 06:16:43 +0000 (08:16 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

4 years agoadjust search page layout - search box to current style
reger [Thu, 1 May 2014 23:15:03 +0000 (01:15 +0200)]
adjust search page layout - search box to current style

4 years agoremove obsolet css class bookmarkfieldset
reger [Thu, 1 May 2014 22:35:54 +0000 (00:35 +0200)]
remove obsolet css class bookmarkfieldset

4 years agoadded configuration option for maxmimum load and minimum ram for
Michael Peter Christen [Wed, 30 Apr 2014 11:26:32 +0000 (13:26 +0200)]
added configuration option for maxmimum load and minimum ram for
postprocessing

4 years agoMerge branch 'master' of git@gitorious.org:yacy/rc1.git
orbiter [Wed, 30 Apr 2014 05:42:52 +0000 (07:42 +0200)]
Merge branch 'master' of git@gitorious.org:yacy/rc1.git

4 years agoinput-group for main search input window
Michael Peter Christen [Wed, 30 Apr 2014 04:46:06 +0000 (06:46 +0200)]
input-group for main search input window

4 years agoenhanced HostBrowser buttons and fixed text input alignment
Michael Peter Christen [Wed, 30 Apr 2014 04:21:53 +0000 (06:21 +0200)]
enhanced HostBrowser buttons and fixed text input alignment

4 years agofix for strange fail reason
Michael Peter Christen [Wed, 30 Apr 2014 03:14:01 +0000 (05:14 +0200)]
fix for strange fail reason

4 years agouse submitted default userAgent if cloning a crawl
Michael Peter Christen [Wed, 30 Apr 2014 03:05:02 +0000 (05:05 +0200)]
use submitted default userAgent if cloning a crawl

4 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Marc Nause [Tue, 29 Apr 2014 22:48:55 +0000 (00:48 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

4 years agoFirst draft of a blacklist API.
Marc Nause [Tue, 29 Apr 2014 22:48:38 +0000 (00:48 +0200)]
First draft of a blacklist API.

4 years agoadd display filter (active/disabled) to IndexSchema_p.html config
reger [Tue, 29 Apr 2014 20:51:01 +0000 (22:51 +0200)]
add display filter (active/disabled) to IndexSchema_p.html config
for easier overview of schema fields

4 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Tue, 29 Apr 2014 17:51:01 +0000 (19:51 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

4 years agofix for slow crawling and better logging in balancer
Michael Peter Christen [Tue, 29 Apr 2014 17:50:33 +0000 (19:50 +0200)]
fix for slow crawling and better logging in balancer

4 years agonpe fix
Michael Peter Christen [Tue, 29 Apr 2014 17:24:05 +0000 (19:24 +0200)]
npe fix

4 years agofix to menu colours
Michael Peter Christen [Tue, 29 Apr 2014 17:13:54 +0000 (19:13 +0200)]
fix to menu colours

4 years agosmall changes to search headline colour
Michael Peter Christen [Tue, 29 Apr 2014 16:46:50 +0000 (18:46 +0200)]
small changes to search headline colour

4 years agofix for result display
Michael Peter Christen [Tue, 29 Apr 2014 14:24:21 +0000 (16:24 +0200)]
fix for result display

4 years agodesign fixes to better use the new colours
Michael Peter Christen [Tue, 29 Apr 2014 14:24:01 +0000 (16:24 +0200)]
design fixes to better use the new colours

4 years agonew default skin pdbootstrap which keeps the design shapes but slightly
Michael Peter Christen [Tue, 29 Apr 2014 14:23:42 +0000 (16:23 +0200)]
new default skin pdbootstrap which keeps the design shapes but slightly
changes the colours to match with bootstrap colours

4 years agobetter buttons
Michael Peter Christen [Tue, 29 Apr 2014 14:22:31 +0000 (16:22 +0200)]
better buttons

4 years agoadd html5 audio/video <source> tag to html content scraper
reger [Mon, 28 Apr 2014 22:41:29 +0000 (00:41 +0200)]
add html5 audio/video <source> tag to html content scraper
- <source src=.. type=..> tag content is added to embed collection

4 years agobootstrap update
Michael Peter Christen [Mon, 28 Apr 2014 09:52:13 +0000 (11:52 +0200)]
bootstrap update

4 years agoMerge branch 'master' of gitorious.org:yacy/icewindxs-rc1
Michael Peter Christen [Mon, 28 Apr 2014 07:17:21 +0000 (09:17 +0200)]
Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1

4 years agofix contentscraper img height/width parsing
reger [Mon, 28 Apr 2014 02:59:47 +0000 (04:59 +0200)]
fix contentscraper img height/width parsing
prevent numberformat exception on common "100px" property

- include in test case

4 years agoUpdate russian translation 43
malykhin.dmitry [Sun, 27 Apr 2014 21:54:34 +0000 (01:54 +0400)]
Update russian translation

4 years agoremove redundant javascript & id in index.html
reger [Sun, 27 Apr 2014 20:22:00 +0000 (22:22 +0200)]
remove redundant javascript & id in index.html
to set focus to query field in IE11

4 years agooptimize and fix lat / lon assignment
reger [Sun, 27 Apr 2014 18:52:06 +0000 (20:52 +0200)]
optimize and fix lat / lon assignment

4 years agoreimplement tighter lat/lon calc in URIMetadataNode
reger [Sun, 27 Apr 2014 16:20:33 +0000 (18:20 +0200)]
reimplement tighter lat/lon calc in URIMetadataNode
from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272

4 years agoadd exit proxy link to UrlProxy
reger [Sat, 26 Apr 2014 20:27:59 +0000 (22:27 +0200)]
add exit proxy link to UrlProxy
on proxied pages a link to exit proxy is added to top of page.
Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed.

4 years agothrow MalformedURLException on unknown protocol
reger [Fri, 25 Apr 2014 23:30:51 +0000 (01:30 +0200)]
throw MalformedURLException on unknown protocol
on other than the supported   http https ftp file smb \\  mailto

4 years agofix: resolve url without path but searchpart
reger [Fri, 25 Apr 2014 18:15:55 +0000 (20:15 +0200)]
fix: resolve url without path but searchpart
e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/"
fixes http://mantis.tokeek.de/view.php?id=47

added test case for getHost

4 years agonpe fix
orbiter [Fri, 25 Apr 2014 07:26:20 +0000 (09:26 +0200)]
npe fix

4 years agonpe fix
orbiter [Fri, 25 Apr 2014 07:23:10 +0000 (09:23 +0200)]
npe fix

4 years agorecover sax fatal error on OAI-PMH import of xml with entity error
reger [Thu, 24 Apr 2014 23:05:28 +0000 (01:05 +0200)]
recover sax fatal error on OAI-PMH import of xml with entity error
this allows to continue loading next resumptionToken even if import file caused sax parser error
fix http://mantis.tokeek.de/view.php?id=63

4 years agoadd current css to HTMLResponseWriter to fix metadata view
reger [Wed, 23 Apr 2014 21:41:10 +0000 (23:41 +0200)]
add current css to HTMLResponseWriter to fix metadata view
(using css from metas.template except js links)

4 years agoMerge branch 'master' of git@gitorious.org:yacy/rc1.git
orbiter [Wed, 23 Apr 2014 21:13:23 +0000 (23:13 +0200)]
Merge branch 'master' of git@gitorious.org:yacy/rc1.git

4 years agofixed a situation where finished crawls had not been detected.
orbiter [Wed, 23 Apr 2014 21:13:07 +0000 (23:13 +0200)]
fixed a situation where finished crawls had not been detected.

4 years agobetter removal of stored urls when doing a crawl start
orbiter [Wed, 23 Apr 2014 21:12:08 +0000 (23:12 +0200)]
better removal of stored urls when doing a crawl start

4 years agoenhanced Host Balancer strategy: fair round robin
orbiter [Wed, 23 Apr 2014 21:11:37 +0000 (23:11 +0200)]
enhanced Host Balancer strategy: fair round robin

4 years agodo not apply lazy value instantiation for numeric or boolean values
orbiter [Wed, 23 Apr 2014 06:41:36 +0000 (08:41 +0200)]
do not apply lazy value instantiation for numeric or boolean values
because that is misleading and confusing in case of 0- or false-values
and may cause NPEs in retrieval functions.

4 years agoin case of short memory, do not cut down robinson peers to 1, just
orbiter [Wed, 23 Apr 2014 06:37:19 +0000 (08:37 +0200)]
in case of short memory, do not cut down robinson peers to 1, just
reduce by 50%

4 years agoexclude html tags in in/outboundlinks_anchortext_txt parsed text
reger [Tue, 22 Apr 2014 22:55:16 +0000 (00:55 +0200)]
exclude html tags in in/outboundlinks_anchortext_txt parsed text
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case

4 years agoadded new button to terminate all crawls
orbiter [Tue, 22 Apr 2014 21:14:54 +0000 (23:14 +0200)]
added new button to terminate all crawls

4 years agocatch IllegalArgumentException for wrong process types (that is needed
orbiter [Tue, 22 Apr 2014 21:14:05 +0000 (23:14 +0200)]
catch IllegalArgumentException for wrong process types (that is needed
for migrations when new process types are introduced or disappear)

4 years agofix for NPE in IndexCreateParserErrors_p.html caused by bad handling of
orbiter [Tue, 22 Apr 2014 17:48:49 +0000 (19:48 +0200)]
fix for NPE in IndexCreateParserErrors_p.html caused by bad handling of
lazy value instantiation of 0-value in crawldepth_i

4 years agoremoved warnings
orbiter [Tue, 22 Apr 2014 17:35:15 +0000 (19:35 +0200)]
removed warnings

4 years agoadd custom Jetty errorhandler
reger [Mon, 21 Apr 2014 15:28:21 +0000 (17:28 +0200)]
add custom Jetty errorhandler
to provide custom error page footer line
- remove redundant mime check in UrlProxyServlet

4 years agodefer creation of new ArrayList after possible early return
reger [Mon, 21 Apr 2014 15:16:06 +0000 (17:16 +0200)]
defer creation of new ArrayList after possible early return
(to skip not used object allocation)

4 years ago refactore URIMetadataNode to further unify interaction with index
reger [Sat, 19 Apr 2014 23:41:30 +0000 (01:41 +0200)]
 refactore URIMetadataNode to further unify interaction with index
-  URIMetadataNode extending SolrDocument
- use language as stored (String), reducing conversion to string
- optimize debug code in transferIndex

4 years ago- remove empty http0_9 status text array
reger [Fri, 18 Apr 2014 20:03:16 +0000 (22:03 +0200)]
- remove empty http0_9 status text array
  and unused default_charset = ISO-8859-1

4 years ago- remove unused manual http KeepAlive config
reger [Fri, 18 Apr 2014 17:57:35 +0000 (19:57 +0200)]
- remove unused manual http KeepAlive config
    (reducing references to obsolete httpdemon)
- add port info to settings_http

4 years agoadd canonical links to the same crawldepth, not the next crawldepth
Michael Peter Christen [Fri, 18 Apr 2014 04:51:46 +0000 (06:51 +0200)]
add canonical links to the same crawldepth, not the next crawldepth

4 years agoincreased runtime for postprocessing query job
Michael Peter Christen [Fri, 18 Apr 2014 04:51:10 +0000 (06:51 +0200)]
increased runtime for postprocessing query job

4 years agospecial strategy for balancer: do not remove targets with zero wait time
Michael Peter Christen [Fri, 18 Apr 2014 04:50:07 +0000 (06:50 +0200)]
special strategy for balancer: do not remove targets with zero wait time
from the queue

4 years agofix for deadlocks in crawler
Michael Peter Christen [Thu, 17 Apr 2014 14:58:17 +0000 (16:58 +0200)]
fix for deadlocks in crawler

4 years agoincreased resource.disk.used.max.steadystate and
Michael Peter Christen [Thu, 17 Apr 2014 14:19:38 +0000 (16:19 +0200)]
increased resource.disk.used.max.steadystate and
resource.disk.used.max.overshot by 4 times because first users reached
that limit and wondered why the crawler was paused automatically :)

The crawler will now stop at 2TB disk usage :)

4 years agoadded crawl depth for failed documents
Michael Peter Christen [Thu, 17 Apr 2014 11:21:43 +0000 (13:21 +0200)]
added crawl depth for failed documents

4 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Thu, 17 Apr 2014 10:55:38 +0000 (12:55 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

4 years ago- better subgraph handling, less overhead for crawls without the
Michael Peter Christen [Thu, 17 Apr 2014 10:54:18 +0000 (12:54 +0200)]
- better subgraph handling, less overhead for crawls without the
webgraph
- usage of crawler crawldepth cache for the linkgraph target depth
computation

4 years agonew Strategies in Balancer:
Michael Peter Christen [Thu, 17 Apr 2014 10:52:54 +0000 (12:52 +0200)]
new Strategies in Balancer:
- doublecheck cache now records the crawl depth as well
- doublecheck cache is available from the outside (made static)
- no more need to crawl hosts with lowest depth first, instead all hosts
which have only singleton entries are preferred to reduce the number of
files.

4 years agofix for Table in case that requested file does not exist and paths also
Michael Peter Christen [Thu, 17 Apr 2014 10:44:05 +0000 (12:44 +0200)]
fix for Table in case that requested file does not exist and paths also
do not exist

4 years agoimplement gzip input handling directly in defaultservlet
reger [Thu, 17 Apr 2014 01:20:29 +0000 (03:20 +0200)]
implement gzip input handling directly in defaultservlet
(making reference to legacy httpdemon obsolete)

4 years agofix for display bug
Michael Peter Christen [Wed, 16 Apr 2014 20:24:04 +0000 (22:24 +0200)]
fix for display bug

4 years agoremoved clickdepth_i field and related postprocessing. This information
Michael Peter Christen [Wed, 16 Apr 2014 20:16:20 +0000 (22:16 +0200)]
removed clickdepth_i field and related postprocessing. This information
is now available in the crawldepth_i field which is identical to
clickdepth_i because of a specific crawler strategy.

4 years ago- added a new Crawler Balancer: HostBalancer and HostQueues:
Michael Peter Christen [Wed, 16 Apr 2014 19:34:28 +0000 (21:34 +0200)]
- added a new Crawler Balancer: HostBalancer and HostQueues:
This organizes all urls to be loaded in separate queues for each host.
Each host separates the crawl depth into it's own queue. The primary
rule for urls taken from any queue is, that the crawl depth is minimal.
This produces a crawl depth which is identical to the clickdepth.
Furthermorem the crawl is able to create a much better balancing over
all hosts which is fair to all hosts that are in the queue.
This process will create a very large number of files for wide crawls in
the QUEUES folder: for each host a directory, for each crawl depth a
file inside the directory. A crawl with maxdepth = 4 will be able to
create 10.000s of files. To be able to use that many file readers, it
was necessary to implement a new index data structure which opens the
file only if an access is wanted (OnDemandOpenFileIndex). The usage of
such on-demand file reader shall prevent that the number of file
pointers is over the system limit, which is usually about 10.000 open
files. Some parts of YaCy had to be adopted to handle the crawl depth
number correctly. The logging and the IndexCreateQueues servlet had to
be adopted to show the crawl queues differently, because the host name
is attached to the port on the host to differentiate between http,
https, and ftp services.

4 years agorefactoring of the crawl balancer: the balancer is turned into an
Michael Peter Christen [Mon, 14 Apr 2014 11:32:35 +0000 (13:32 +0200)]
refactoring of the crawl balancer: the balancer is turned into an
interface and the old balancer class is moved into LegacyBalancer to
make room for a fresh implementation of a crawl balancer.

4 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Mon, 14 Apr 2014 10:17:52 +0000 (12:17 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

4 years agoautoupdate fails to download latest release (1.71) due to default release blacklist
reger [Sun, 13 Apr 2014 05:32:32 +0000 (07:32 +0200)]
autoupdate fails to download latest release (1.71) due to default release blacklist
- removed the default version blacklist regex from init (for future versions)

!!!  left existing update  blacklist setting untouched !!!
(existing installation wanting autoupdate for 1.71 need to change blacklist in ConfigUpdate_p.html)

- moved old blacklist patch to migration.java

4 years agofix for virtual root nodes
Michael Peter Christen [Fri, 11 Apr 2014 13:12:34 +0000 (15:12 +0200)]
fix for virtual root nodes

4 years agofind depth-matches also for edge targets
Michael Peter Christen [Fri, 11 Apr 2014 10:27:21 +0000 (12:27 +0200)]
find depth-matches also for edge targets

4 years agointroduction of a data structure for HyperlinkEdges which should use
Michael Peter Christen [Fri, 11 Apr 2014 10:09:33 +0000 (12:09 +0200)]
introduction of a data structure for HyperlinkEdges which should use
less memory as it does no double-storage of source links for each edge
of the graph.

4 years agousing MultiProtocolURL for edge data which is faster (hash computation
Michael Peter Christen [Fri, 11 Apr 2014 08:58:37 +0000 (10:58 +0200)]
using MultiProtocolURL for edge data which is faster (hash computation
is now much easier) and smaller in size

4 years agoenhanced hashcode computation for MultiProtocolURL
Michael Peter Christen [Fri, 11 Apr 2014 08:23:48 +0000 (10:23 +0200)]
enhanced hashcode computation for MultiProtocolURL

4 years agofix for maximum tag length in parser
Michael Peter Christen [Fri, 11 Apr 2014 07:56:44 +0000 (09:56 +0200)]
fix for maximum tag length in parser

4 years agorefactoring of SystemLoad calls (only one backend tool)
Michael Peter Christen [Fri, 11 Apr 2014 07:25:18 +0000 (09:25 +0200)]
refactoring of SystemLoad calls (only one backend tool)

4 years agorefactoring
Michael Peter Christen [Thu, 10 Apr 2014 21:46:35 +0000 (23:46 +0200)]
refactoring

4 years agoMerge branch 'master' of git@gitorious.org:yacy/rc1.git
orbiter [Thu, 10 Apr 2014 19:40:54 +0000 (21:40 +0200)]
Merge branch 'master' of git@gitorious.org:yacy/rc1.git

4 years agostrong redesign of html parser: object recursion is now made using a
Michael Peter Christen [Thu, 10 Apr 2014 16:58:03 +0000 (18:58 +0200)]
strong redesign of html parser: object recursion is now made using a
stack on html tag objects, not using a recursive parse-again method
which may cause bad performance and huge memory allocation. The new
method also produced better parsed image objects with exact anchor text
references.

4 years agofix for wrong status codes of error pages
Michael Peter Christen [Thu, 10 Apr 2014 07:08:59 +0000 (09:08 +0200)]
fix for wrong status codes of error pages

4 years agoalso delete the robots.txt file from the cache when a new crawl is
Michael Peter Christen [Wed, 9 Apr 2014 19:59:54 +0000 (21:59 +0200)]
also delete the robots.txt file from the cache when a new crawl is
started

4 years agoMerge branch 'master' of git@gitorious.org:yacy/rc1.git
orbiter [Wed, 9 Apr 2014 17:58:54 +0000 (19:58 +0200)]
Merge branch 'master' of git@gitorious.org:yacy/rc1.git

4 years agofix for robots.txt handling: delete old entry before starting a new
Michael Peter Christen [Wed, 9 Apr 2014 16:33:48 +0000 (18:33 +0200)]
fix for robots.txt handling: delete old entry before starting a new
crawl.

4 years agolinkstructure refactoring to get more options for clickdepth analysis
orbiter [Wed, 9 Apr 2014 15:52:51 +0000 (17:52 +0200)]
linkstructure refactoring to get more options for clickdepth analysis

4 years agoMerge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Michael Peter Christen [Wed, 9 Apr 2014 10:45:15 +0000 (12:45 +0200)]
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git

4 years agonew structure and enhancements for link graph computation:
Michael Peter Christen [Wed, 9 Apr 2014 10:45:04 +0000 (12:45 +0200)]
new structure and enhancements for link graph computation:
- added order option to solr queries to be able to retrieve document
lists in specific order, here: link length
- added HyperlinkEdge class which manages the link structure
- integrated the HyperlinkEdge class into clickdepth computation
- extended the linkstructure.json servlet to show also the clickdepth
and other statistic information