Master's Thesis Topic Ideas

By professor Jacob Palme,
Department of Computer and Systems Sciences
KTH Technical University

Last revision: 26-Oct-2007

A longer and more complete version of this document in Swedish.

Introduction

  If you get this document printed on paper, check that the web version at http://dsv.su.se/jpalme/master-thesis-ideas.html might contain a newer, revised version of this document.  

Areas in Which I can Supervise a Thesis

 
  • Electronic mail, forum software, electronic publication, methods for human-to-human communication using computers.
  • Distributed applications, client-server systems.
  • Human-computer-interaction.
  • Behavioural science studies of effects of computers and computer applications.
  • Question-answering, searching, knowledge organisation on the world wide web.
 

Specific suggested topics

 

New topic ideas are added at the top of this list, so the items at the bottom of the list may be quite old.

  • Analyze the menu structure for finding information through a categories list on the home page of http://web4health.info/ and subcategories. Suggest and test one or more alternative menu structures to make it as easy as possible for site visitors to find information on the web site using this menu structure.
  • User dialogs with a question-answering system: Web4Health uses a question-answering system. The overall system's performance is good, but we believe it could be better if we had a better understanding of user behavior. This master thesis would search for patterns in user behavior, repeated chains of questions being asked, and come up with suggestions how to meet user expectations even better.
  • Same as above for the Web4Health web site, can be done in either Swedish or English.
  • Would the http://web4health.info/ web site gain by establishing visibility in Second Life or its own application in Facebook? If so, how? (Note: The Swedish magazine "Internetworld" No. 9, November 2007, had an article in Swedish about developing applications in Facebook.)
  • Menu editor. Many websites, including Web4Health include a large number of menus (lists of links to pages) in different format. Design a prototype of a good editor for managing such menus, and evaluate the editor by user testing and keystroke level model analysis.
  • KEI formula. When doing Search Engine Optimization, one important task is to find good search strings to optimize for. An ideal search strings should be popular in queries but not have high competiton on the web. Different formulas try to compute this in different ways. Compare different ways of computing KEI and find which is best.
  • Quality of search results in medical search engines. Compare the quality of search results for general search engines like Google, Yahoo, MSN and Teoma with specialized medical search engines (a list of such can be found at http://web4health.info/en/answers/proj-search-other.htm or for Swedish at http://web4health.info/sv/answers/proj-search-other.htm. Test with multi-word queries like "Causes of headache", "Symptoms of bulimia nervosa", "Interaction between Prozac and Naproxen", Treatment of obesity". Can be done in Swedish on Swedish search engines, or in English on English search engines.
  • Time to view search result list. When people view a search result list from for example Google, a hypothesis might be that the time to view this list, before clicking on one of the alternatives, is lower if the chosen alternative is higher in the list. The log files of http://web4health.info/ could be used to investigate this, using the same method as previously done in Christer Dalevind's thesis.
  • Effects of ad color and placement on click frequency. Make some experiments, on the web4health web site, which has 6000 visitors/day, to see how different coloring and placement of ads will effect how often visitors click on links in the ads.
  • Search Engine Optimization and Content Management: How can Search Engine Optimization (SEO) and Content Mangement Systems (CMS) be combined? Which existing systems provide such combination, what more could be done in such a combined system?
  • Use of synonyms in natural-language question answering: When people ask questions to a natural-language question-answering system, they may use different words for the same question. For example, they may type the question "Effects of divorce on children" or "Effects of parent separation on children" and mean the same thing. Thus, natural-language question-answering systems need to include synonyms of the terms and phrases they can handle.

    This can be handled either by manually creating such synonym lists, or by automatically using a synonym dictionary, or both combined. There are obvious pros- and cons of both alternatives.

    This thesis will investigate these two alternatives, and find out by practical experiments which alternative gives the best question-answering result.

    You can do the thesis work on either Swedish synonyms or English synonyms, whichever you prefer. The natural-language question- answering system in http://web4health.info/ can be used as an experimental platform.

  • SQL versus files: The KOM2002 system used by Web4Health at present has a database made up of files. Make an investigation and specification of how the system could be converted to using MySQL, and an evaluation of pros and cons of making such a conversion.
  • Dual-step searching. When a person makes a search on the Internet, using a search engine like Google, the search is actually a two step process:
    1. The person types a query string to Google and Google supplies a list of possible answers.
    2. The person scans through the list of answers provided by Google and selects the answer which seems to best fit what the person wanted to find. Sometimes, the person looks at more than one of the articles listed by Google before finding a good answer.
    Write a paper which studies Internet searching including both these two steps.

    As a conclusion of this two-step model, a search engine like Google may improve the search result by providing a varienty of different kinds of answers. For example, if a person searches with the search string "flowers" it might be better for Google to return a list containing one article about where to buy flowers, one about flowers as a biological concept, one about how to draw flowers, one about computer tools for drawing flowers, one about an inventory of different kinds of flowers, etc. This might increase the sucess of the second step of the query more than if the list provided by Google contained ten articles about different flower shops or ten articles about flower biology.

    The paper could also involve an investigation of whether major search engines do in fact try to list a variety of different aspects of the query in the list of answers returned (usually the first 10 answers listed). This could be investigated by testing the different search engines.
  • Web-based-editing. It is very practical to let an application run entirely on the server and only have an ordinary web browser as client. Unfortunately, it is not easy to design a good WYSIWYG text editor in this way. This thesis could evaluate the options, look how other people have solved this, and maybe also develop a prototype of one variant as an example.
  • More and more often, gadgets which we use have computers in them, even though we do not regard them as primarily computers. Examples: Dishwashers, micro ovens, mobile phones, video recorders, etc. Some of them have rather complex user interfaces. Evaluate the user interfaces of for example a set of mobile phones or a set of video/dvd hard disk recorders.

    Part of such an evaluation might be to ask a number of users of the kind of device which functions they most often use. Since such devices are something used very often, the user interface should not only be easy to understand, but the common actions should also require as few steps/clicks as possible.

    If you choose this topic, and want to look at video/dvd hard disk recorders, I can write a text with my own experience, which can be of use as start for such a thesis.
  •  Important!  Development of a bilingual dictionary for evaluating term extraction within the psychological domain of Web4health portal. Powerful tools are available to make the process mostly automatic. For more information about this particular task, contact Andrea Andrenucci, e-mail <andrea@dsv.su.se>.
  • Is it possible to predict the position a web page will get in Google result listings. Various measurements of the competition does not seem to be well correlated with actual position. Why? Web4health can be used as a test tool.
  • What is the coreelation between Wordtracker prediction and real number of visitors to a page in Web4Health? Why?
  • Make an overview of multi-lingual content management techniques. I have written a paper on this at http://tinyurl.com/ce6ff, but your paper should also study what other people have done and describe what should be required of a good such system.
  • Define the state of the art in Cross Lingual Question Answering and find the weaknesses of the available approaches. What is missing and what can be done?

    A more detailed description of this task in Swedish can be found at http://dsv.su.se/jpalme/exjobbsforslag.html#KEI, but I can translate it to English if a non-Swedish student wants to tackle this task.
  • These three topics may be the same topic described in two different ways:
    • The web site http://Web4Health.info/ uses an older version of QuickAsk. A newer version of QuickAsk can convert questions to SQL statements, which may mean that the classification can be simplified in Web4Health. Try this out, maybe also make a test implementation and a comparison of the results in how well questions can be answered and if classification is simplified or not. You may want to look at semantic net or semantic web technology.
    • The web site Web4Health does not at present use semantic networks for question-answering. Investigate if use of semantic network could improve the question-answering in this web site. Since it is a medical web site, a semantic network could for example use links like "cause", "diagnosis", "prognosis", "treatment", "side effects", etc.
    • The web site Web4Health does not at present use semantic nets for question-answering. Investigate if use of semantic network could improve the question-answering in this web site. Since it is a medical web site, a semantic network could for example use links like "cause", "diagnosis", "prognosis", "treatment", "side effects", etc.

  • Self help guides: By self help guides is meant computer programs to help people change their life in order to improve health, for example to stop smoking. There are many such programs. Make an overview of the area, of how such programs usually works, and how such programs should work, how they are usually designed, how they should be designed, how useful they are in achieving the wanted change and to increase health.
  • Home DVD recorders with hard disk. A new product, DVD writers with hard disk have emerged during recent years in the consumer electronics market. There are also software packages for turning a PC or a Macintosh into a similar media center. Typical facilities of such media centers are to record television programs on hard disk or DVD in a format that ordinary DVD players can read. Thes are quite complex, compared to other consumer electronics systems, and their HMI design is often not ideal. In this topic, I suggest that some of the market leading products for these kinds of units are examined for their HMI qualities. I have two such machines at home, so I can provide some input based on my personal experience with these systems, on how to test and which attributes to measure. My general impression for the models I have at home is that the manufacturers have been in too much a hurry to get a product out, and not had time for enough user testing of the HMI. New models correct some HMI problems with the previous model, but instead introduce new HMI problems.
  • (Content Management Systems and standards: Investigate, for some common content management systems, how well they are in producing correct HTML and CSS, and how well they are in producing pages which are accessible to disabled people. Also look at whether they produce search-engine robots friendly coded (HTML links, not too much information hidden from robots by Javascript). )
  • Web page content effect on user behavior: The Web4Health web site contains about 900 informational pages most of them available in multiple languages. Some of the pages are long and detailed, others are short and concise. Some of them contains a large number of links to other pages in or outside Web4Health, others contain no specific such links (a few general links are included in all pages). All the main pages contain a button , which returns a list of related pages created by the Web4Health software.

    In this work, you will investigate how the difference between different Web4Health pages influences user behavior. You will need to use or write software to analyze the Apache logs, which shows how users use Web4Health, and also software to analyze the Web4Health pages, counting the number of links, etc. Then this date is combined to give a description of how different content of different pages influences whether users stay or leave Web4Health, and other user behavior.)

  • Clicking links versus searching: A study of the usage of the Web4Health web site shows that users click on internal site links more often than they use the built-in search engine. A previous master's thesis at DSV has investigated how easy it is to find information by using the built-in search engine in Web4Health and three other Swedish medical web sites. Extend this study in either of two ways:
    • Make a similar comparison of how easy it is to find answers to a question by clicking on links (including or excluding the "Find a few related answers" button in Web4Health) on the same four web sites.
    • Make a similar comparison on only Web4Health, studying in more detail why and how people choose to click on links more than use the search engine and whether this is actual optimal behavior in finding what they are looking for.
  • The Web4Health data base is at present available in English, Swedish and German, and partly in Italian, Finnish and Greek. Develop a version of the data base for a new language. It should be a language which you know very well. And you will not have time to make a full version of the data base to a new language, but you might do a partial translation of the home page and menus, and have the actual texts provided in English. Also make an evaluation of how useful such a partial translation is for people using the language you have translated the web site to.
  • Cross-lingual question-answering German-English in Web4Health. The Web4health web site provides natural-language question-answering in several languages, including German. By cross-lingual is meant that answers are not only found in German, but questions are translated to English and answers are also found in the English data base. Evaluate how this influences the results for German users of Web4Health, do they get better and more easily found answers than if answers were only found in the German data base.
  • Investigate and give an overview of how better editors could improve the editing environment in a content management system like KOm2002. Implement some of what you have found suitable. This task includes user interface design and user testing combined with the restrictions of what common web browser allow using only HTML and Javascript.
  • Google secrets: Google has published some general principles of how it works. Some other investigators have tried to find out more. But there are also secrets, which Google will not divulge. First make a review of what is already known about how Google works. Then make a number of tests, in order to find out more about the inner secrets of Google. This is a task for an imaginative and clever person! More info. Another paper of interest.

    Note: Another student has written a recent thesis about this. But more work should be possible, in particular trying to develop the optimal formula which has highest correlation with Google page positioning.
  • (Compare the search tool QuickAsk as it is used in the web site Web4Health with other search tools like Google, with and without "site:web4health.info", Alkaline, SiteSeeker. Quality can be measured by measuring recall and precision, probably best restricted to the first 10 answers listed.)
  • Google API is a service offered by Google which allows people to develop different new services based on Google. Make an analysis of this offer, and develop your own applications based on the Google API.
  • (DSV is implementing a medical information web site (http://web4health.info). The site has more than 160 000 visitors (more than 2 million hits) per month). Make a study of how visitors use this web site, in particular try to make conclusions of how the web site can be improved, based on the information in these log files. This task may include writing software to analyze the log files in new ways. Note: Of course you need not analyze all the information in these large files, you can select only a sample of them.)
  • Other studies of usage of the Internet for medical information and support, see for example http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14664725&dopt=Abstract
  • Client side proxies can be used to remove content (cookies, Java, adverts), modify content (remove blink, GIF animation), Security (anonymisation, blocking), logging, acceleration, adding content (annotations, related links). Make a through study of this, test some client side proxies, propose your own ideas. http://crit.org/. Well-known is the Muffin proxy.
  • Altruism on the Internet. Why are people willing to help other people so much on the Internet. How common is it? Is it really altruism?
  • (Eriks Sneiders, a previous research student at DSV, has developed a system for providing web-based FAQs (collection of common questions and answers) and looks for research students who want to develop such FAQs on topics like, for example, Internet protocols. His e-mail address is Eriks Sneiders <eriks@dsv.su.se>. His system can be tested, but on a specific topic, on http://ekd.dsv.su.se/, click on FAQ.)
  • Pseudocurrency on the web, example: http://www.beenz.com. How much is it used, what are the benefits, when will it be successful.
  • Web page content effect on user behaviour: The Web4Health web site contains about 900 informational pages most of them available in multiple languages. Some of the pages are long and detailed, others are short and concise. Some of them contains a large number of links to other pages in or outside Web4Health, others contain no specific such links (a few general links are included in all pages). All the main pages contain a button , which returns a list of related pages created by the Web4Health software.

    In this work, you will investigate how the difference between different Web4Health pages influences user behaviour. You will need to write software to analyze the Apache logs, which shows how users use Web4Health, and also software to analyze the Web4Health pages, counting the number of links, etc. Then this date is combined to give a description of how different content of different pages influences whether users stay or leave Web4Health, and other user behaviour.

  • Clicking links versus searching: A study of the usage of the Web4Health web site shows that users click on internal site links more often than they use the built-in search engine. A previous master's thesis at DSV has investigated how easy it is to find information by using the built-in search engine in Web4Health and three other Swedish medical web sites. Extend this study in either of two ways:
    • Make a similar comparison of how easy it is to find answers to a question by clicking on links (including or excluding the "Find a few realted answers" button in Web4Health) on the same four web sites.
    • Make a similar comparison on only Web4Health, studying in more detail why and how people choose to click on links more than use the search engine and whether this is actual optimal behaviour in finding what they are looking for.
  • The Web4Health data base is at present available in English, Swedish and German, and partly in Italian, Finnish and Greek. Develop a version of the data base for a new language. It should be a language which you know very well. And you will not have time to make a full version of the data base to a new language, but you might do a partical translation of the home page and menus, and have the actual texts provided in English. Also make an evaluation of how useful such a partial translation is for people using the language you have translated the web site to.
  • Cross-lingual question-answering German-English in Web4Health. The Web4health web site provides natural-language question-answering in several languages, including German. By cross-lingual is meant that answers are not only found in German, but questions are translated to English and answers are also found in the English data base. Evaluate how this influences the results for German users of Web4Health, do they get better and more easily found answers than if answers were only found in the German data base.
  • Investigate and give an overview of how better editors could improve the editing environment in a content management system like KOm2002. Implement some of what you have found suitable. This task includes user interface design and user testing combined with the restrictions of what common web browser allow using only HTML and Javascript.
  • Google secrets: Google has published some general principles of how it works. Some other investigators have tried to find out more. But there are also secrets, which Google will not divulge. First make a review of what is already known about how Google works. Then make a number of tests, in order to find out more about the inner secrets of Google. This is a task for an imaginative and clever person!
  • Compare the search tool QuickAsk as it is used in the web site Web4Health with other search tools like Google, with and without "site:web4health.info", Alkaline, SiteSeeker. Quality can be measured by measuring recall and precision, probably best restricted to the first 10 answers listed.
  • The web site http://Web4Health.info/ uses an older version of QuickAsk. A newer version of QuickAsk can convert questions to SQL statements, which may mean that the classification can be simplified in Web4Health. Try this out, maybe also make a test implemenntation and a comparison of the results in how well questions can be answered and if classification is simplified or not.
  • Develop a tool for caching web pages. By "caching" is in this topic meant creating a copy of a web page similar to the copies you get with the Cached command in Google result lists. The cache can be a directory with an HTML file and other files used, like images.
  • Make an evaluation of web sites for medical information to non-experts.
  • There are a number of well-known, but non-proven, beliefs in how a web page should be organised to be easy to read. It "should have 40-60 characters per line". It "should use Verdana or Georgia fonts, not use Times New Roman font". In a previous master's thesis, two students at DSV found that these beliefs were not true. Make new tests to check who is right, the common beliefs or what the DSV students found.
  • Altruism on the Internet. Why are people willing to help other people so much on the Internet. How common is it? Is it really altruism?
  • Analyze discussions in e-mail discussion lists or web based forums. Specially investigate when people agree and do not agree. How much of the discussions are disagreements? Do people understand each other? What kind of discussions are most constructive?
  • Investigate gateways beteen internet mail and message systems like First Class and Lotus Notes. Do they work correctly, what are the problems?
  • Investigate software for protecting children agains "unsuitable" information. Does it work? Does it suppress too much information which should not be suppressed?
  • Search Engine Optimization: Does it work? Is it ethical? Does it make search engines better or less good?
 

Other information

 

Common English language errors in master's theses.

Use of the word "I" in scientific papers.

Writing research papers - a step by step guide.

Detailed guide on writing scientifc papers.