Skip to content
Menu
Menu

Knowledge Base

KB050097 | EXTRACTING DATA FROM WEB PAGES

KB050097 | EXTRACTING DATA FROM WEB PAGES

Product: Elvis
Booth: 2006-03-14

Summary

This example shows how to use the “Custom Port” to provide data from websites as data points.
In the example, the temperature predicted by www.wetteronline.de is determined for Kalchreuth.

Download sample project and script: inet.zip (0.1 MB)

Note: The Elvis server often runs as a service under the local system account. Internet access must also be possible for this account so that the connection can receive data from the Internet.

Details

In the example, the forecast temperature for Kalchreuth is determined by http://www.wetteronline.de every 6 hours and written to a data point. (The corresponding URL for other cities can be found via http://www.wetteronline.de/homepage/index.html ).

The sample project assumes that the Basic file ElvisINet.bas is located in the C:\Program Files\Elvis directory. If you use a different directory, please adjust the parameter in the data point connection accordingly.

How does it work?

A data point connection “INTERNET” has been set up, which can request HTML pages and extract data from them using the basic file ElvisINet.bas.

The solution is quite general, so that it may even be possible to adopt the Basic file unchanged. The data points of this port can only be read and must be connected to a cyclic query.

The technical addresses of this connection consist of two parts, the URL of the HTML page to be read, followed – separated by a vertical bar – by a “regular expression” that determines which part of the page should be taken as the data point value.

They are URLs of the form “http:…”, “https:…” and “FTP:…” permissible. Authentication and cookies are not supported at the moment. It must be clarified on a case-by-case basis whether and under what conditions the use of a website for the desired purpose is permissible!

The regular expression (see e.g. http://de.wikipedia.org/wiki/Reguläre_Ausdrücke) must contain a group enclosed in parentheses. The string corresponding to this group is extracted as a value. In the example, the regular expression is “(\d+)°C”, which means in plain language: find an angle bracket followed by a number (one or more digits), followed by “°C” (“°”> is the HTML notation for the degree sign), and return the number as a result. There is certainly room for improvement (e.g. negative numbers are not recognized), but it is only intended to illustrate the principle here.
In order to correctly interpret negative numbers, it is necessary to add “-?” to the regular expression. It looks like this: “(-?\d+)°C”>

The technical address can be used in Elvis max. 255 characters long. If this is not enough, there are the following ways out:

  • If the URL is very long, it may be shortened by a redirect service such as tinyurl.com .
  • ElvisINet.bas could be modified to read the URL and regular expression from a file that is technically available. Address would then only be a key in this file.

Support Area

KB050097 | EXTRACTING DATA FROM WEB PAGES

Knowledge Base

Here you will find answers, solutions to problems and examples of our products.
KB050097 | EXTRACTING DATA FROM WEB PAGES

Case Studies

Successful in use: practical examples of our products and individual developments.
KB050097 | EXTRACTING DATA FROM WEB PAGES

Support

Describe your concern using our support form.
KB050097 | EXTRACTING DATA FROM WEB PAGES

License registration

Register your Elvis license!

IT GmbH · An der Kaufleite 12 · D-90562 Kalchreuth

© Copyright 2024. IT GmbH | Webdesign by Appear Online