Client-side proxies

Master's thesis, May 2000

Tomas Viberg 

 << Previous [ Background ] Next >>

Table of contents

2 Background

The most obvious starting point for a survey of related works is to look at other works with a similar comparative approach to the client-side proxy architecture for content processing. However, there does not seem to be any, so instead this background will survey documentation about using client-side proxy servers as a fundamental part of application architecture. What will primarily be examined is for what tasks the proxy is used, notable details of the proxy architecture, deployment experiences and, if this is discussed, the reasons why the proxy approach was preferred.

2.1 The original proxy

One original function of proxy servers is to intercept communication between client applications and remote servers in order to improve network efficiency through caching. Since network congestion is not a diminishing problem, this continues to be an important function of proxy servers [Thaler and Ravishankar 98]. As all requests go through the proxy, documents that are requested frequently can be stored locally for later use, decreasing both the response time experienced by users and the overall network load of subsequent requests.

Providing caching through a proxy is a natural choice. The proxy provides a service that is transparent to the user as well as to client and server applications. Transparency can be beneficial since users probably are more interested in the service provided than in the particulars of its functionality. Transparency also allows users to share a single proxy easily, for example on a local area network. It is in this situation that the biggest gains of a caching proxy are realised.

There will be no in-depth discussion of traditional caching functionality, since the focus of this work is on proxies working locally as single-user applications. At the same time, caching functionality in a client-side proxy could prove beneficial to the individual user, for example by increased browser independence. Through this a user gets more control of what is stored locally, the ability to switch client and still have access to the same cached documents and a consistent way to view documents off-line, regardless of client support. Another reason to include caching functionality in client-side proxies with other tasks is that these tasks themselves might result in increased response time. When caching is discussed, it will be in this context, as a way to improve the efficiency of client-side proxies.

2.2 A more versatile approach

Moving away from the traditional view of proxy servers towards the kind of proxies examined in this thesis, [Brooks et al 96] "generalise the notion of proxy servers to construct application-specific proxies that act as transducers on the HTTP stream". Normally, clients and servers expect that requested documents remain unchanged during transport from server to client, even if they are cached copies. The motivation for this transgression is that substantial value can be added by working directly on HTTP streams to view and alter the contents. The stream transducers, called OreOs, can have practically any functionality, implemented examples include URL validation, measuring network performance, creating group histories, supporting group annotation of documents and creating full-text indexes of accessed documents.

Every OreO is a specialised stream processor, with the freedom to use information from any obtainable source and to produce arbitrary output. The architecture is modular, aimed at facilitating sophisticated behaviour by aggregating highly specialised modules. This is supported by the ability to place OreOs in a chain, so that the output from one is the input for another. This kind of system can be configured with high granularity and set up to support the specific needs of different classes of users, from individuals through groups to enterprises and the public.

Introducing processing modules in the content stream affects the performance of network transactions, especially if many modules operate on the same stream. However, during tests the delay caused by introducing OreOs in the stream was mostly so small that users hardly perceived them, as they were accustomed to variations in network performance. The delay naturally depends on the efficiency and complexity of the different OreOs, but if the delays are kept small it does not have to be a big problem, especially if the added value is substantial enough.

A proposed architectural improvement is to encapsulate the content stream using a higher level of abstraction than the current low-level byte stream. This would probably help third-party developers increase their productivity, and this is indeed a notion supported by several client-side proxies today. Other issues of interest are how to achieve the benefits of a modular approach and ways to minimise the impact on performance.

2.3 Some examples

One system using the notion of proxy servers described above is Crowds [Reiter and Rubin 99]. It enables users to retrieve Web content anonymously, using a client-side proxy server as the backbone of functionality. The idea is to create crowds of users and relay requests through a chain of proxies in the crowd. Neither the addressed server nor the proxies along the relay path can be sure who originally sent the message. Why the proxy solution was chosen is not explicitly stated. A reasonable assumption is that it was because the task at hand is to intercept communication between the client (browser, ftp client, etc) and the server transparently.

Experiences from deployment of the Crowds system have shown that there are some potential drawbacks to the proxy approach. As already mentioned, any intermediary might slow down the retrieval of content and/or result in increased network traffic. If the proxy is aimed at improving network efficiency this is not an issue, but that is not the goal of the Crowds system, and so there will be some performance degradation.

There could also be problems when trying to use client-side proxies behind firewalls or other security constructs. In the Crowds system, the proxies communicate through non-standard network ports, which might be disallowed. A related problem is that system administrators often want to monitor user communication. However, monitoring users in a crowd is not easy, which could inspire administrators to forbid the use of such systems. This is not a problem directly related to the use of proxies, but since several existing proxies are used for enhancing the privacy of its users, it is an interesting question. These and related legal, moral and ethical questions will be discussed further in later sections.

Pavilion is a framework for developing collaborative web-based applications [McKinley et al 99]. An important part of the framework is a client-side proxy server, with both traditional proxy functionality like caching frequently requested pages and tunnelling content through firewalls, and functionality that is more versatile. The default behaviour of the Pavilion proxy is to provide a group with a common view, for example, allowing several users to automatically view the same document as the group's leader. This is achieved by multicasting information from the leader proxy to the other proxies in the group. Apart from this, the Pavilion framework uses the notion of extensible proxies, meaning that external modules can be attached to the proxy as plug-ins to facilitate type-specific processing of requests and resources before their delivery to the client application. This architectural detail is interesting since it facilitates processing of the actual content flowing through the proxy, as opposed to proxies that simply relay requests and replies, ignoring the content. Through this, Pavilion realises the notion of a content-altering proxy.

Apart from the proxy server, Pavilion also offers interfaces to popular web browsers and protocols for floor control and multicast delivery of content, both aimed at facilitating distributed collaboration. Browser integration is achieved with operating system-specific inter-process communication mechanisms. This is an approach with possible negative effects on the platform and browser independence of a system using the Pavilion framework.

In the context of this work, the Pavilion framework raises two issues to be examined further. First, the merits of extensible proxy servers will be discussed in more detail in subsequent sections. Second, the question of whether browser integration is desirable, and if so, how it should be done, will also receive attention.

Browser integration is also an issue in WebMate, a system for helping users browse and search the web more effectively [Chen and Sycara 98]. WebMate uses a local stand-alone proxy server to monitor and learn from the browsing and searching behaviour of the user. This system provides a relatively close integration with the client's browser environment, not by using browser or platform dependent methods but by inserting the user interface directly into the requested document. The user can interact with the system through a controller applet at the bottom of each document, supplying interests, providing relevant information for processing and receiving help. Whether or not this is a better solution to browser integration than the one Pavilion provides will be examined later.

The WebMate proxy is used for more demanding tasks than in previously described systems. Intercepting communication between server and client is one of the functions, but the content of this communication is not altered in any significant way. Communication patterns and user feedback is processed with machine-learning algorithms to build and refine a model of user interests based on keywords describing relevant documents. Through this model, WebMate can automatically provide documents of interest to the user. Another task is to increase the quality and relevance of search results through criteria refinement and keyword expansion. Both these tasks require advanced functionality and algorithms; functionality implemented directly in the WebMate proxy rather than provided as plug-in functionality to a modular proxy server.

2.4 Proxies in mobile environments

To revisit the more traditional proxies, one common use is to provide a bridge between different transfer protocols. For example, a web browser lacking knowledge of the gopher protocol can access gopher-based material through a proxy. The proxy acts as a translator, speaking HTTP to the browser and gopher to the server. Taking this a step further, a proxy can act as a connection between fundamentally different environments such as stationary and mobile environments. One current example of this approach is the Wireless Application Protocol [WAP 00] that utilises proxy servers to adapt standard Web content to mobile devices through negotiation and translation. Most traditional techniques assume that the location of clients and the client-server connection remains unchanged during communication sessions, which is obviously not the case in mobile environments [Jing et al 99]. The mobility of clients, differences in display technology and the relatively low bandwidth of wireless links are some of the factors that must be taken into account when adapting content from stationary networks to the needs of mobile users.

Adaptation of communication and content can be made mobile-aware using different techniques, of which transparent proxy-based adaptation is of most interest here. The proxies are rarely pure client-side proxies, since stable wireless communication often requires processing on both the mobile client and in the stationary network. Client-side proxies running on mobile devices still play an important role, providing an interface to regular servers and attempting to shield the negative effects of the mobile environment from applications and users. Transparent caching, prefetching of requested documents and support for disconnected operations are among the tasks performed by mobile client-side proxies.

Transparent adaptation to mobile environments might be detrimental to overall functionality and performance, since it is very difficult to meet the diverse needs of different applications not themselves mobile-aware. Allowing the affected applications to control parts of the adaptation process might prove useful. This issue is not specific to mobile environments, and the benefits and drawbacks of transparency will be discussed further.

2.5 A great diversity

The overall impression of the background survey is that there is a great diversity of choices made in the design of systems using client-side proxies, regarding both functionality and fundamental architecture. Despite this, one possible conclusion is that client-side proxies are most useful when the task at hand involves monitoring or altering the communication between clients and servers. This will serve as a starting-point for resolving when a proxy approach is appropriate and when it is not. Supposing such an approach is preferable, other important issues can be identified and must be evaluated.

One issue is whether the proxy architecture should be monolithic or modular. This touches on the subject of creating sophisticated behaviour by aggregation and if this should be supported by chaining, extensibility or not at all. If a proxy is extensible, how to present the content to developers of additional functionality is a relevant issue. Should a developer have access to the content as a low-level byte stream, or should the proxy parse the stream to provide a higher level of abstraction, such as wrapper objects for individual HTML elements? How to avoid performance degradation and the level of transparency are other issues related to application architecture. Also of interest is if a proxy should be integrated with or independent of browsers and operating systems and, as a related issue, how to support user interaction. Privacy concerns and legal, moral and ethical considerations are also questions that will be examined in the remainder of this thesis.


 << Previous [ Background ] Next >>

Table of contents