FIGURE 1 | FIGURE 2 | FIGURE 3 | FIGURE 4
Session 9: Web agents
Programming CF pages
cfhttp tag, 153, 896
CGI variables, 460
cookies, 461
files, 461
XML, 710
form-field data, 459
header/body content, 462
information, retrieving, 440-452
limitations in ColdFusion versions, 462
parsing data, 452-454
retrieving HTTP header information, 445
returned variables, 900
static web pages, generating, 454-457
URL variables, 459
also related to this lesson cfhttpparam tag, 458, 900
XML, 461
Chapter 14
An agent is a piece of software performing a repeating service according to a timetable without needing to be requested each time. On the web, an agent is located in a server from which it serves its clients. CFMX has features which make the language well suited for implementing such types of services. Even though many types of agents exist, we shall consider only two applications of agents leaving others for you to design.
Advanced agentLet us consider a more advanced agent. In addition to collecting news information regularly from one or more sources, Agent 2 accepts subscriptions for news pages containing topics specified by the user in the form of keywords. Each time a news page is retrieved from the news source, the agent reads through the page to see if it contains any of the topics requested by the subscribers, and if so, it e-mails a copy of the news to the lucky subscribers.
On request, this agent sends a subscription form to a user asking about his/her name, email address, topic and source(s)of interest, as well as time interval for the wanted service. By submitting the form, a subscription is recorded by the agent, and the service will start. In this example, only a single topic per request is possible, but servicing more complex requests is quite possible.
In the application example demonstrated, two news sources, CNN and Washington Post, are copied every third hour. Each time, the copies are scanned for keywords provided by the subscribers. Each time a hit is detected, the agent sends the subscriber(s) who provided the keyword, an email with a copy of the relevant news page attached.
Figure 2 gives an overview of the application. This agent system consists of 5 logical parts
* the scheduler
* the news retriever
* the subscription service
* the parser
* the dischargerThe parts will be discussed in a slightly different order in the following paragraphs.
Before the agent can be activated, the news sources and retrieving frequency must be specified. We focus on the scheduling and assume here that the 2 news sources, CNN and Washington Post, are already selected. The scheduling is similar to the one we described for the simpler Agent 1 above, and implemented by the following form template:
1. <!--- schedule_form.cfm --->
2. <h2><font color="Blue">Scheduling or deleting the news collection</font></h2>
3.<cfloop index="i" from="1" to="2">
4. <p>a. Scheduling data collection from source #i#:</p>
5. <form action="schedule.cfm" method="post">
6. <p>Startdate (mm/dd/yy):<input type="text" name="STARTDATE#i#"></p>
7. <p>Starttime (hh:mm AM/PM):<input type="text" name="STARTTIME#i#"></p>
8. <p>Update interval(sec):<input type="text" name="INTERVAL#i#"></p>
9. <p>End date (mm/dd/yy):<input type="text" name="ENDDATE#i#"></p>
10. <p>End time (HH:mm AM/PM):<input type="text" name="ENDTIME#i#"></p>
11. <p>Timeout for request:<input type="text" name="TIMEOUT#i#"></p>
12. <input type="submit" value="Schedule">
13. </form>
14.</cfloop>
15. <p><a href="delete2.cfm">Stop </a> the news collection from both sources. </p>
[top]
The schedule_form.cfm passes the control on to the schedule.cfm, which is identical with the template discussed in connection with the simpler agent. Also this agent can be scheduled by means of the CFMX Administrator.
The subscription can be taken care of by means of a very simple form illustrated in Figure 3, and be implemented by the service.cfm template:
2. <FORM action="register.cfm" method="post">
3. <TABLE>
4. <TR><TD>Your name:</TD><TD><INPUT name="nname" type="text"></TD></TR>
5. <TR><TD>Your email address:</TD><TD><INPUT name="email" type="text"></TD><</TR>
6. <TR><TD>The topic in which you are interested:</TD><TD><INPUT name="topic" type="text"></TD></TR>
7. <TR><TD>Source for the service:</TD><TD></TD></TR><br></TD></TR>8. <TR><TD>Washington Post:</TD><TD><INPUT name="source" type="radio" value="1"><br></TD></TR>
10. <TR><TD>CNN:</TD><TD><INPUT name="source" type="radio" value="2"><br></TD></TR>
11. <TR><TD>Washington Post and CNN:</TD><TD><INPUT name="source" type="radio" value="3"><br></TD></TR>
12. <TR>
13. <TD>No. of days you want the service:</TD><TD><INPUT name="days" type="text"><br></TD></TR>
14. <TR><TD></TD><TD><INPUT type="submit" value="Submit"></TD>
15. </TR>
16. </TABLE>
17. </FORM>
[top]
This form indicates the similarity with a search system in which only one keyword is permitted in the search. An improvement of this form would be to permit multiple topic keywords, and topics which were structured into a complex request as is for example possible in the search engine discussed in a previous session.
The content of the subscription form is registered in a table called agent2 at a datasource #session.datasource# set in the Application.cfm.:
2. <cfset MyDate=now()>
3. <cfset untill= DateFormat(DateAdd('d',#form.days#,MyDate),'mmmm dd yyyy')>
4. <CFQUERY name="register" datasource="#session.datasource#">
5. INSERT INTO agent2(nname, email,topic, untill) VALUES('#form.nname#', '#form.email#','#form.topic#','#untill#')
6. </CFQUERY>
7. <cfoutput>
8. <H3><FONT color="blue">#form.nname#<br>
9. Your request has been recorded.</FONT></H3>
10. </cfoutput>
Lines 1-6 take care of the registration in the database. If the registration is successful, and a message is returned to the subscriber by email.
The main template of Agent 2 covers the retrieval, parsing and discharging of news to the subscribers:
2. <cfhttp method="GET" url="http://www.cnn.com/" resolveurl="Yes" timeout="300">
3. </cfhttp>
4.<cffile action="write" file="#application.path#\temp.html" output="#CFHTTP.FileContent#">
5.<cffile action="READ" file="#application.path#\temp.html" variable="retrieved">
6.<cffile action="write" file="#application.path#\retrieved.html" output="#retrieved#">
7. <cfhttp method="GET" url="http://www.washingtonpost.com/" resolveurl="Yes" timeout="300">
8. </cfhttp>
9. <cffile action="write" file="#application.path#\temp.html" output="#CFHTTP.FileContent#">
10. <cffile action="READ" file="#application.path#\temp.html" variable="retrieved2">
11. <cffile action="write" file="#application.path#\retrieved2.html" output="#retrieved2#" >
12. <cfquery name="subscribers" datasource="#application.datasource#">
SELECT nname, email, topic, untill, source FROM agent2
13. </cfquery>
14. <CFSET PresentDate=#DateFormat(now(),'mmm dd yyyy')#>
15.<CFLOOP query="subscribers">
16. <CFIF #PresentDate# LT #subscribers.untill# >
17.<cfif (#subscribers.source# EQ "1" OR "3") AND (#retrieved# NEQ "")>
18. <cfset source_name="CNN">
<19. < cfset text=Trim(#subscribers.topic#)>
20. <cfset Position=REFindNoCase(#text#,#retrieved#,1)>
21. <cfif #Position# GT 0>>
22. <cfmail
to="#subscribers.email#"
from="svein@nordbotten.com"
type="html"
server="alf.uib.no"
subject="News: On #text# from #source_name#">#retrieved#
</cfmail>
23. </cfif> 24. </cfif>
25. <cfif (#subscribers.source# EQ "2" or "3") AND (#retrieved2# NEQ "")>
26. <cfset source_name="Washington Post">
27. <cfset text=Trim(#subscribers.topic#)>
> 28. <cfset Position=REFindNoCase(#text#,#retrieved2#,1)>
29. <cfif #Position# GT 0>
30. <cfmail
to="#subscribers.email#"
from="svein@nordbotten.com"
type="html"
server="alf.uib.no"
subject="News: On #text# from #source_name#">#retrieved2#
31. </cfmail>32. </cfif>
33. </cfif>
34. </cfif>
35. </cfloop>
This template describes the collection of news from 2 sources. Lines 2-11 take care of retrieving and storing the news from the 2 sources.Note the difference in saving retrieved pages from agent.cfm. In some cases, implementation by a temporary file, temp.html, may make it easier to get an acceptable execution of the agent. The template can be made more elegant by looping 2 times through most of the lines.
The query named subscribers in Line 12 selects the subscriber data and stores them in a query object. Then follows the CFLOOP query block which contains the rest of the lines. This loop is run for each subscriber. However, the first line in the loop is a CFIF tag testing if the stop date for the subscription has passed. In that case, the subscriber is skipped. The regular expression in Line 20 tests the retrieved news page from CNN for the topic. It continues in a similar way with the news from Washington Post.
[top]
Finally, we shall need a template to delete the whole service. Template delete2.cfm deletes both sources at the same time. It would be easy to introduce an arrangement which permitted to delete a selected source.
2. <cfschedule action="delete" task="Retrieve1" >
3. <cfschedule action="delete" task="Retrieve2" >
4. <cfoutput>
5. <H3>Agent task deleted.</H3>
6. </cfoutput>
The regular expressions in Lines 20 and 28 willl be discussed in further detail in session 11.
The scheduling of agents can be difficult to implement. There are , however, several alternatives for scheduling processes. The ColdFusion Administrator has been referred to several times, but require that the developer has access to this utility.
Another posibility is to use the meta tag , e.g. <META HTTP-EQUIV="REFRESH" content="18000">, at the top of the agent.cfm template and drop the scheduling template. The template must be kept running as long as the service is offered. Using the example, the agent template will be executed every 18000 second, i.e. every 5 hour. This alternative is simple, but requires that the administrator of the agent keeps the template running as long as he offers the service.
A third alternative for Windows is to select Settings -> Control board -> Schedule tasks which offers a wizard for setting the execution of tasks as running the agents at specified times.
The CFHTTP tags open for the development a number of different Internet agents. Information can be downloaded from remote hosts as in our agent examples, but information flow can also be uploaded from a local host to a number of remote hosts. For example, a news agency has its own corps of field reporters uploading their news from their laptops to the agency's host computer as soon as they finish their stories. Newspapers around the country can subscribe to the news according to specified time and topic profiles from the agency. The agency host downloads automatically the new stories to pre-set folders in the newspaper hosts according to specified time schedules and topic profiles. The individual newspaper can process the accepted stories in its folder for stories from the news agency according to their individual editorial policies.
A famous type of agents is the stock exchange monitoring agents. They monitor the stock exchange values on a continuous basis and signals automatically crucial changes to subscribers of their services.
Spider agents crawling around in the Internet is another well-known application. In the spider applications, the agent visits a set of already recorded web sites, parses the pages for relevant content, saves their content, follows identified links to new pages, and repeats the parsing. Some spiders re-visit already recorded pages and make comparisons for updating.
[top]
The first application scenario is simple: Assume the existence of a news agency connected to the net. It updates frequently a news message page for a large range of different events. An organization, running an intranet, wants to offer the local intranet users a mirror of the news agency page to avoid unnecessary visits to the internet by the users of the intranet. One solution is to implement a web agent which periodically scans the news agency page, creates and maintains a local mirror which the local users can access.
In principle, the agent outlined is composed of:
1. the retriever controlled by the scheduler, and
2. the dispatcherThe retriever collects and updates the news data according to a frequency controlled by the scheduler. The time scheduler sets start and end times as well as frequency of the news collection. The dispatcher returns the news on demand from the clients.
In applications discussed so far in this course, activities were carried out when requested. A web agent application can work independently of requests, but it requires adequate server resources to avoid that it will be too engaged in performing agent tasks and neglecting calls from clients for on demand service.
Agent 1[top]
In the first application, Agent 1, a very basic service is discussed. A more advanced client, Agent 2 is discussed in the last part of this session.
The first task considered is to retrieve the News from the CNN net service once every second hour, maintaining an updated local copy which can be requested by clients on demand.
The application includes templates for the following tasks:
* the news collecting and maintenance part
* the setting of the scheduling part
* the news service partFigure 1 outlines the scenario.
The core of the first template is the powerful CFHTTP tag which can be compared with a CFQUERY tag, but with the important difference that the CFHTTP tag queries other web servers, and not a database.
2. <cfhttp method="GET" url="http://www.cnn.com/" resolveurl="Yes" >
3. </cfhttp>
4. <cffile action="write" file="c:#application.path#\retrieved.html" output="#CFHTTP.FileContent#">
The CFHTTP tag in Line 2 indicates that we want to retrieve the CNN entry page in order to establish a local mirror of the CNN page. The attribute RESOLVEURL, set to "yes", resolves URL's within the mirror page so they will also function for the client environment. The page is returned in a variable called CFHTTP.FileContent.
The content is saved in Line 4 to maintain a file named retrieved.html. The particular disk address at which the content is stored,can be set in the Application.cfm as an application wide variable application.path.
The agent administrator will need a form to specify the start and end times and the frequency of the news collection. This form is used only once for starting up the agent's activity, and eventually for terminating the service before specified. The template taking care of this task is:
1. <!--- schedule_form.cfm --->
2.<h2><font color="Blue">Scheduling or deleting the news collection</font></h2>
3. <p>a. Scheduling data collection:</p>
4. <form action="schedule.cfm" method="post">
5. <p>Startdate (mm/dd/yy):<input type="text" name="STARTDATE"></p>
6. <p>Starttime (hh:mm AM/PM):<input type="text" name="STARTTIME"></p>
7. <p>Update interval(sec):<input type="text" name="INTERVAL"></p>
8. <p>End date (mm/dd/yy):<input type="text" name="ENDDATE"></p>
9. <p>End time (HH:mm AM/PM):<input type="text" name="ENDTIME"></p>
10. <p>Timeout for request:<input type="text" name="TIMEOUT"></p>
11. <input type="submit" value="Schedule">
12. </form>
13. <p><a href="delete.cfm">Stop </a> the news collection. </p>
Note that the template refers to 2 actions, the template to schedule the agent in Lines 3-12, and the template to delete an existing time schedule in Line 13.
The schedule.cfm template for processing the scheduling data, is short:
2. <cfschedule action="UPDATE"
task="Agent"
operation="HTTPRequest"
startdate="#startdate#"
starttime="#starttime#"
interval="#interval#"
enddate="#ENDDATE#"
endtime="#ENDTIME#"
url="http:/#application.url#/agent.cfm"
resolveurl="Yes" requesttimeout="120">The template is simple, but has a number of attributes of which only ENDDATE is optional. It contains a variable, application.url which must be set in the Application.cfm.
If you have access to the CFMX Administrator, the schedule can alternatively be set by means of that utility. However, if you are renting time from an ISP, you will usually not have access to the Administrator feature.
[top]
If the stop option is selected in the schedule_form.cfm, the following template is run:
2. <cfschedule action="delete" task="Agent" >
15. <cfoutput>
16. <H3>Agent task deleted.<;/H3>
17. </cfoutput>
The final component of the news service, is a form for the clients to request the news:
2.<p><h2><font color="Blue">Information collection by agent</font></h2></p>
3. <p>Do you want a <font color="Red">news</font> report? <a href="retrieved.html">Yes</a>/ No.</p>
where the link in Line 3 is all that is needed to respond to the request by serving the page retrieved.html..
In a real situation, a more elaborate form would be designed, but for demonstrating the principles of web agents, this simple form will do. As usual, you will find a link to an implemented example of the agent at the end of the session.
[top]
Exercisesa. The CFHTTP tag used in this session is discussed in Chapter 14 in RBB. Study the text, it will give you further ideas about the many possible applications of this tag.
b.The examples in this session are simple agent applications. Using the powerful CFHTTP tags with its many optional attributes, complex agents can be implemented. Agents surveying the exchange market are popular. Using action "POST" makes it also possible to work as a distributor. The agent is then scheduled to release and distribute messages from another process to a list of remote servers as well as clients.
c. CFSCHEDULE is described in Chapter 22 of RBB. Even though this tag is mainly related to the application server management, we have demonstrated that it has clearly also interest for the application programmer.
d. If you have some ideas for developing agents, write them down and post them on the message board.
Link to the session examples.
Link to the session test.
the initial CFHTTP command.
<cfhttp url="***URL HERE***" method="get" ></cfhttp>
Steve McKean
UH-Email
CT FORUM CF
user - enter
Steve McKean
UH-Email
CT FORUM CF
user - enter
CFMX HISTORY RESOURCES
OBJECTIVES
Implementation aspects: