User Tools

Site Tools


en:problems:dsearch:development

Hard to design, develop and test

Distributed systems are hard to design, develop and test because of uncertainties due to:

  • The large number of processes running in parallel
  • Processes that update their variables independently
  • Problems specific to the development of distributed applications that are not exactly implemented by known programming languages and tools.
  • Decentralised crawlers need to coordinate in order to not visit the same hashes multiple times (although we need some duplication for failure tolerance), and the adopted crawling policy needs to be strictly enforced. The coordination of decentralised and/or distributed crawlers can incur significant communication overhead, limiting the number of (simultaneous) crawlers.
  • Heuristics or classifiers for resource allocations in distributed systems (crawl queue assignment and load balancing between peers)?
  • Good enough recrawling strategies (heuristics).
  • With performant decentralised full text search algorithms, we can move to fully decentralised search, such that nodes can directly interact with IPFS when searching. For making the index horizontally scalable, the search enine will need to be able to infer knowledge in a trivial time interval: an efficient, scalable and distributed execution pipeline for clustering. The clustering can perhaps be achieved via a fuzzy similarity relation obtained by the transitive closure of a proximity relation.
  • And what if a dweb solution solving most of these already exists, and we can use it for a distributed index?

en/problems/dsearch/development.txt · Last modified: 2020/03/10 09:34 by 54.36.150.2