Distributed crawler

Decentralised web crawling would use many computers to index the IPFS space via web crawling. Users are to voluntarily offer their own computing and bandwidth resources towards crawling the IPFS corpus. By spreading the load of these tasks across many computers, costs that would otherwise be spent on maintaining large computing clusters can be avoided.

  • Heuristics or classifiers for resource allocations in distributed systems (crawl queue assignment and load balancing between peers)?
    • Stochastic heuristics
    • Features to extract for classifiers
    • Effects on precision and recall
  • Attributes to use (possibly with breach/scam/fraud cases provided by third parties (coalition partners)) for a classifier for the Validator.
  • Good enough recrawling strategies (heuristics).