MyWEST: My Web Extraction Software Tool """""""""""""""""""""""""""""""""""""""" (Andrea Stella, MS and Marco Masseroli, PhD) http://www.medinfopoli.polimi.it/MyWEST/ DATABASE CONFIGURATION FILE ============================ DATABASE WHERE TO SAVE THE EXTRACTED DATA ___________________________________________ - Set "DRIVER" with the name of the driver used for connecting the database storing the extracted data. Examples: . For ODBC connection (default option also for MS-Access databases) set "DRIVER=sun.jdbc.odbc.JdbcOdbcDriver". . For direct connection to MySQL Server databases using the MyWEST provided driver set "DRIVER=com.mysql.jdbc.Driver". . For direct connection to MS-SQL Server 7 databases using the MyWEST provided driver set "DRIVER=com.inet.tds.TdsDriver". - Set "DBURL" with the URL for accessing the database storing the extracted data. Examples: . For databases accessible through the ODBC connection set "DBURL=jdbc:odbc:DATABASE_NAME" (e.g. DBURL=jdbc:odbc:DBgene. DBgene is the default name of the MS-Access database storing the extracted data). . For databases accessible using specific drivers through direct TCP/IP connection, set "DBURL=jdbc:DRIVER_NAME:DATABASE_FULL_URL" Examples: . DBURL=jdbc:mysql://111.111.111.111:3306/GeneData for the GeneData MySQL database listening on the default door 3306 of a computer with IP 111.111.111.111, and using the MySQL above described driver. . DBURL=jdbc:inetdae7://111.111.111.111:1433/GeneDB for the GeneData MS-SQL Server 7 database listening on the default door 1433 of a computer with IP 111.111.111.111, and using the above MS-SQL Server 7 described driver. - In case an user ID and password are required for the connection, set "USER" and "PASSWORD" labels accordingly. Otherwise let these unset, i.e. USER= , PASSWORD= ,(default values). DATABASE SETTINGS ------------------------------- DRIVER=sun.jdbc.odbc.JdbcOdbcDriver DBURL=jdbc:odbc:DBgene USER= PASSWORD= PROXY CONFIGURATION _____________________ To use a proxy for accessing the Web interfaced databanks to extract the data from, set accordingly "PROXYSERVER" with the proxy IP number or name, and "PORT" with the proxy port. If no proxy is used, set "PROXYSERVER=NOPROXY" and "PORT=". This is the default setting. PROXY SETTINGS -------------------------- PROXYSERVER=NOPROXY PORT= WEB INTERFACED DATABANKS TO EXTRACT DATA FROM ____________________________________ For each Web interfaced databank to extract data from, set as it follows. - Set "NAME" with an unique name for the Web interfaced databank (e.g. NAME=GeneCards). - Set "BASEURL" with the complete URL (excluded the page identifying code) for linking a Web page containing data to extract of the Web interfaced databank (e.g. BASEURL=http://genome-www.stanford.edu/cgi-bin/genecards/cardsearch.pl?search=). - Set "PARAMETER" with the identifying code of the reference page to use for creating templates for the Web interfaced databank. (e.g. PARAMETER=H59260, where H59260 is the accession number of the nucleotide sequence whose data are contained in the reference page). - If linking the Web interfaced databank with the URL set in "BASEURL" and the code set in "PARAMETER" an intermediate Web page containing a link to the desired Web page is retrieved, set the "URL1" and "LINK" labels. . Set "URL1" with the base URL of the following page, excluded the URL inside the tag of the link identifying the Web page containing data to extract. (e.g. URL1=http://genome-www.stanford.edu/cgi-bin/genecards/). . Set "LINK" with the text of the link, in the intermediate Web page, to the desired Web page (e.g. LINK=Display). If the text of the link is equal to the identifying code used to link the Web interfaced databank, set "LINK=@@@@". - Set "CODE_TYPE" with a short for the type of page identifying code used to set "PARAMETER". (e.g. CODE_TYPE=AN for Accession Number codes, CODE_TYPE=LL for LocusLink codes). - The "END" label defines the end of a Web interfaced databank settings. When adding a new Web interfaced databank, be sure that its setting information are contained within the "NAME" and "END" labels. Use a different "NAME" for each Web interfaced databank in the setting list. WEB INTERFACED DATABANKS SETTINGS ----------------------------------------- NAME=UniGene BASEURL=http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ACC= PARAMETER=M27396 if using proxy add URL1=http://www.ncbi.nlm.nih.gov/UniGene/ if using proxy add LINK=here CODE_TYPE=AN END NAME=LocusLink BASEURL=http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l= PARAMETER=4898 CODE_TYPE=LL END NAME=Swiss-Prot BASEURL=http://www.expasy.org/cgi-bin/niceprot.pl? PARAMETER=P09581 CODE_TYPE=SP END NAME=SourceSearch BASEURL=http://genome-www4.stanford.edu/cgi-bin/SMD/source/sourceResult?organism=Hs&option=Number&choice=Gene&criteria= PARAMETER=T53775 CODE_TYPE=AN END NAME=GeneCards BASEURL=http://bioinfo.weizmann.ac.il/cards-bin/cardsearch.pl?search= PARAMETER=NM_000135 LINK=Display URL1=http://bioinfo.weizmann.ac.il/cards-bin/ CODE_TYPE=AN END NAME=Mouse Genome Informatics (MGI) BASEURL=http://www.informatics.jax.org/searches/accession_report.cgi?id= PARAMETER=MGI:87895 CODE_TYPE=MGI END