Success!

Note

Error

Session expiration Your session is going to expireClick here to extend

Budget:

Small project <800

Posted on

7/5/16 10:20 AM

Buyer:

KVK***

This project has expired

Why don't you register anyway? We are sure that you will find many similar projects out of the thousands waiting for you!

Post similar project now

Description

Logical requirements

I need a webscraper for the German website [OBSCURED]. It should iteratively query the search engine for a set of user defined key words (Definition in a config file). Each query results should be filtered regarding two criterions:

  • Does the search result match with a user defined regular expression (Definition in a config file)?

  • Wasn’t the result already found before in a previous run? (The results are stored by date in descending order. Hence, this is also an abort criterion for one key word)

If both criterions are matched the scraper should extract specific data from the query result and store it into a database. Furthermore, the tool should notify a user by email about the recently found information.

Technical requirements

  • I need full source code access. Code should be well documented.

  • Tool must run on a low performance device (like Raspberry Pi) and under Linux

  • Languages

    • Preferred: Java Version 8 or Scala 2.11

    • Alternatively: C++/Python

    • NO Perl

  • Specification of a proxy should be possible

  • Storage database should be MS Access compatible or SQL Lite

 

You can contact me both in Eglish and in German.