KNOWLEDGE BASE

face-white

Process server configuration

14.03.2006 10:48

KB050097 | EXTRACT DATA FROM WEB SITES

INF

Product: Elvis
Stand: 2006-03-14

Summary

This example shows how to use the “Custom Port” to make data from web sites available as datapoints. In the example the temperature forecast from www.wetteronline.de for Kalchreuth is extracted.

Download example project and script: inet.zip (0,1 MB)

Hint: Elvis Server normally runs as service unter the local system account. For this account, internet access has to be enabled.

Details

In the example the temperature forecast for Kalchreuth from http://www.wetteronline.de is extracted every 6 hours and written to a datapoint. (The URL for other cities can be determined via http://www.wetteronline.de/homepage/index.html .)

he sample project assumes that the Basic file ElvisINet.bas is stored in the directory C:\Programme\Elvis. Is you use another directory, please adjust the parameter in the datapoint port accordingly.

How does this work?

A datapoint port “INTERNET” is configured that can request HTML pages and extract data from them with the help of the Basic script ElvisINet.bas.

The solution is rather general so that the Basic script might be used unchanged for other cases. The datapoints of the port  can only be read and must be configured with a cyclic request.

he technical addresses of the port consist of two parts, the URL of the page to be requested followed by – separated by a vertical bar –  a “regular expression” determining which part of the page is to be interpreted as datapoint value.

URLs of the form “http:…”, “https:…” and “ftp:…” are allowed. Authentification and cookies are not supported at the moment. It has to be checked in each case if (and under which conditions) the use of the web site for this purpose is allowed!

The regular expression (see e.g. http://en.wikipedia.org/wiki/Regular_expression) must contain one group enclosed in parentheses. The string matched by this group will be extracted as value. In the example the regular expression is “>(\d+)°C”, which means in clear text: search for a left angle bracket followed by a number (one or more digits), followed by “°C” (“°” is the HTML code for the degree character), and return the number as result. This certainly can be improved (e.g. negative numbers are not recognized), but it illustrates the principles. For interpretation of negative numbers, the regular expression has to be extended by “-?” So we have now: “>(-?\d+)°C”

The technical address is limited to 255 characters in Elvis. If this is not sufficient, the following workarounds are possible:

  • if the URL is very long it could be shortened by a redirect service like tinyurl.com
  • ElvisINet.bas cound be modified to read the URL and the regular expression from a file, the techn. address then would only be a key into the file.