Notes
- Choose Semiautomatic template creation modality to create templates for extracting set of data structured as they appear formatted on an HTML page.
- Choose Manual template creation modality to create templates for extracting and structuring sparse data on an HTML page.
- For long lists of sequence identification codes, set the value of Gap between two extractions (msec) in Data extraction panel to 15000, at least. This avoids loading the web server of the considered databank with many concentrated requests. Otherwise, the considered databank web server can permanently disable any service to the used computer IP.
- The updating functionality properly works only if MyWEST has been correctly connected to a relational database.
Problem - Solution
P. I. When you select the databank to extract data from in the Web DB selection menu of MyWEST Template configuration panel, the reference HTML page for the selected databank does not appear in the web browser window. S. I.1 With some web browsers, the user must manually refresh the web page visualized in the browser window to see the reference HTML page appearing.
P. II. When you click the Catch button in the Template configuration panel, a Select a unique Anchor warning window appears. S. II.1 A unique sequence of characters on the reference HTML page must be provided as anchor.
Repeat the selection of the anchor sequence of characters and choose a unique sequence of characters on the page.
For best performances, the anchor must be taken within or as closer as possible to the data to extract.
P. III. When you click the Catch button in the Template configuration panel, a not found warning window appears. S. III.1 All characters of each sequence of characters selected either as anchor, or within the data to extract to identify them, must have the same appearance (i.e. they must be selected within the same HTML tag in the page HTML code).
Repeat the selection of the sequences of characters, choosing characters with the same appearance on the page.S. III.2 Spaces (i.e. blank characters) must not be present at the end of the sequences of characters selected either as anchor, or within the data to extract to identify them, pasted in the text fields of MyWEST template building interface.
Correct the selection of the sequences of characters by deleting spaces at the end of the sequences of characters pasted in the text fields of MyWEST template building interface.
P. IV. When you click the Catch button in the Template configuration panel, an Extraction problems warning window appears. S. IV.1 The sequences of characters selected to identify the data to extract must belong to the same single HTML structure in the page containing the set of data to extract.
Repeat the selection of the sequences of characters, choosing characters belonging to the same HTML structure in the page.S. IV.2 The sequences of characters selected to identify the data to extract must be distinct (i.e. they must not identify a single data).
Repeat the selection of the sequences of characters, choosing distinct sequences.
To extract single sparse data on an HTML page, choose Manual template creation modality.
P. V. When you click the Catch button in the Template configuration panel, a warning window with the table list of all the sequences of characters in the page equal to the selected sequences appears. S. V.1 At least one of the sequences of characters selected to identify the data to extract is not unique in the considered HTML page.
Unambiguously identify the chosen sequence by selecting the corresponding row in the table list of the warning window. In this list, each sequence of characters in the HTML page equal to the selected sequence is unambiguously identified by the characters preceding the sequence on the page.
P. VI. The extraction obtained at the end of template configuration is not satisfactory. S. VI.1 Besides the anchor, the other sequences of characters must be selected within the data to extract, and the extraction parameters must be adequate to the considered HTML page structure.
Click the Discard or Cancel button and repeat template configuration by selecting more adequate sequences of characters or modifying the default extraction parameters used for the entire template.
P. VII. When you click the End button in the top right part of the template building interface in MyWEST Template configuration panel, the template creation is ended without saving the template. S. VII.1 During template configuration, use only the buttons in the Semiautomatic selection (or Manual selection) area of the template building interface.
Click the End button in the top right part of the template building interface of Template configuration panel only after the name of the new created template table is shown in the Templates tables text area on the left side of the main template building interface of MyWEST Template configuration panel, or to end template configuration without saving.
© Marco Masseroli, PhD masseroli@biomed.polimi.it - Last update on June 16, 2004 - 16:11:46