This paper is intended as a practice oriented introduction to
Internet mailinglists as research material for German researchers in
Japanese Studies who have little previous experience with online
material.
After some general remarks about mailinglists I mention new
opportunities as well as limitations when dealing with mailinglists as
research sources and present some current approaches to mailinglist
research, with a focus on examples from social and culture studies.
Following is a short analysis of what I think should be pursued
further and a brief introduction of my own project that tries to fill
a gap I perceive between two kinds of studies so far. I end with some
recommendations of resources that can be useful for a Japan related
mailinglist project.
Currently the online counterpart of this paper is available from
http://www.gmd.de/People/Irene.Langner/docs/publ.html. If the page
moves, it should be retrievable with the meta-keys "mailinglist
research" and "Japanologentag 1999".
Mailinglists as regarded in this paper are Internet based electronic
discussion groups (as opposed to one-directional distribution lists).
On the server side they are administered by a list program (e.g.
listserv, listproc, majordomo, mailbase, lyris, mailman), for the
participants they are accessible via simple electronic mail.
In many respects mailinglists resemble (Usenet) newsgroups, bulletin
board systems (BBSs), forums in online services or mailbox-nets, but
in practice the different systems are often used by different people
and for different purposes (cf. Döring 1999:35f).
With list archives made accessible via the WWW and webboards having
similar functions, in some areas recently a merging of list and web
use can be observed.
Although the principle of electronic conferencing dates back more than
20 years to the early days of computer mediated communication (CMC),
with the world wide spread of the Internet nowadays a more diverse use
and international participation leads to new potentials for
information and communication as well as for research.
Because of the low bandwidth needed, mailinglists are especially useful for people in less connected areas. They are an important means of communication for distributed interest groups, for pioneers or people with special needs or interests who do not find like-minded people or support locally. They can be particularely useful for academic and learning activities.
Mailinglist vary greatly in focus and style. Lists can resemble notice boards, news tickers, talkshows or self-help groups. Features like the quality of information contributed, the style of interaction, discussion or moderation differ considerably from list to list. Accordingly lists can be examined in a variety of ways. Here I would like to distinguish roughly between three types of motivations for the study of mailinglists:
When dealing with mailinglists it is also important to recognize what
we do not know about the observed people and their communication.
Researchers who are interested in representative studies face the
problem that the Internet user population is still not representative
for the population at large, so generalisability can only be reached
with respect to certain user groups (or an additional "offline" effort
has to be made).
What we se on a list, may not be the whole picture necessary for a
sufficient understanding of what is going on.
E.g. there may well be additional private communication in the
background of a list that also influences list discussions, but can
not be observed by a list member. This is often the case with answers
to questions sent via personal mail, so we cannot judge whether an
information need has been satisfied through list members, unless there
appears a clarifying statement on the list.
In most cases the majority of subscribed members remains passiv (engl:
"lurker", jap: "ROM" ("read only member")), so except for mail
addresses we often do not know anything about the biggest part of such
a "group". Do they read messages at all, do they just log or archive,
is the mail account abandoned?
For the interpretation of certain behaviour "in real life" we are used
to take into account additional visible information, like non-verbal
indications for moods or intentions, gender, age, physical condition
or status of the people discussing. Nevertheless in a text based
online context such social cues about the participants are either
absent or at least we cannot be sure. Arbitrary construction of
virtual persons is easily possible.
We cannot even assume that one mail address means one person. Behind
one e-mail address there may well be several people, other lists or
software agents.
E.g. "Tanaka
Tomoyuki" is a wellknown figure in newsgroups like
soc.culture.japan, but there have been a lot of speculations about his
actual identity.
If a poster really intends to hide his identity, clues about the
origin of an e-mail can almost completely be removed by using an
anonymous remailer.
In the case of a research project that includes automatic counting of
postings, threads, or communication relations, spam (i.e. unsolicited
commercial e-mail), off-topic postings and other "non-contributions"
may distort the picture.
On many mailinglists subscription and logging of all communication
is possible for anyone, but participants may not be aware of this fact.
E.g. a study about potentially embarrassing communication in
newsgroups showed a surprisingly low level of risk perception (Witmer
1998:140).
In addition, personal information gained through mailinglist
observation can be combined with other observations of online
behaviour, because Internet users leave traces and personal
information in all sorts of places: through e-mail, on news, lists,
websites etc. In summary, putting together several information
sources, user profiles of great detail can be generated and the
possibilities for misuse have increased enormously compared to "offline"
research, so the legitimacy of online research activities has to be
questionned in every case. Like with other types of observation there
is also the danger of destroying one's subject by "tearing it into the
light" through research (Smith 1999:211f).
In the paragraphes above I mentioned several chances for new kinds
of research, but also systematic limitations, as well as the need for
self-imposed restrictions for the sake of privacy protection.
Dependent on one's main research interest these limitations
may be serious ones, but on the other hand also in visible
communication situations there is always hidden but important
information involved. Advantages and disadvantages (cf. Döring
1999:206-208) have to be weighted according to the particular research
design.
Even if we can only see parts of a picture and concentrate on the part
of communication that constitutes the shared goods of a list
(be it more the information pool or a social value produced), the enormous
variety of mailinglist communication still makes it worth while to
take a closer look, e.g. at the different types of use and motivation
for participation.
In the following I would like to briefly introduce some recent characteristic examples of mailinglist studies, including newsgroups studies, because often a similar methodology can be applied. In particular I focus on features of the list material studied, discipline context of approach, methods used and selected findings. Although a distinction between quantitative and qualitative approaches is used in order to characterize the main focus, I (with most of the authors mentioned) do not regard these approaches as mutually exclusive.
In a series of studies in the context of social network analysis the
authors conducted automated quantitative analyses of participation in
mailinglists, looking at the frequency of postings, social networks on
lists as seen through common threads, lurking behaviour, and
inter-list connections.
Results include: Through a formal block analysis of postings on one
list over 14 months they found an unequal participation in mailinglist
discourses, and not the often claimed egality of cyberspace
communications. Positions and roles (measured in terms of frequency of
posting as well as communication relations) emerged on the list as in
real life. (Stegbauer/Rausch 1999a).
Using seven list archives of two years (1996-98) the authors also
studied the role of lurkers and found that only 30% of all new
subscribers got active ("delurked") within one year. If people
delurked, then relatively soon after subscription. There were fewer
lurkers in high volume lists, which may be an indication that the
primary motivation for lurking is not to "free-ride", i.e. to get a
maximum of information for free.
People who lurked in one list sometimes were active in others, so
maybe lurkers can have an important function for connecting discussion
spaces. On a more principle level - given the large numbers of
subscribers on many lists - it can be said that the existence of
lurkers is one condition for the possibility of list communication,
because if everybody "talked" at once, message overload would lead to
the destruction of communication (Stegbauer 1999).
Another study examined the hypothesis that mailing lists lead to more
interdisciplinary contacts. Comparing the membership lists of 1300
academic list of the UK Mailbase system, the authors found less
participation across disciplines than expected (Stegbauer/Rausch 1999b).
Harald Buck in a quantitative study that also contains a detailed
description of the studied list, examples and interpretations, tested
several existing hypotheses about characteristics of e-mail against a
selection of postings from a German language research oriented
mailinglist. Namely: - E-mail is a new text category of its own.,
- E-mails contain comparatively many violations of norms for written
text, - E-mail lies in between written and oral communication, -
E-mail authors make use of discourse supporting means.
Out of the 735 mails from a 10 month discussion period Buck selected a
representative sample of 231 mails for his analysis.
In contrast to common judgements about electronic communication, the
mistake rates remained within common ranges and were rather dependent
on author and situation. Only some nearness to oral communication could
be found, whereas several features of traditional letters (e.g. a three
part structure with greeting, main text and another greeting) were
found to be preserved. Language proved to be slightly informal, but
polite. There was a high degree of dialogue supporting functions
(quoting in over 57% of postings); emoticons were used as compensation
for channel reduction. In summary the author suggests to refrain from
rash generalisations about e-mail and electronic communication.
Jeanette Hofmann observed 6 months of list discussion on a technical
(IETF) mailinglist and contrasted form and content of the results of
her "lurking" observation with those of an interview carried out
with one of the main list debaters in a later stage.
Her own record of an important longer debate on the list is
constructed as a play in seven acts, where she identifies the main
actors, actor types, topics and open questions.
Insights gained from this observation include a sense of how
mailinglists reflect Internet technology development. The author notes
an extremely open and cooperative culture of discourse on the list, as
well as collective striving for solutions and common
interpretations. She attributes this cooperative behaviour to the
characteristic selection of participants on the list: Most of them are
pioneers and experts working at technical frontiers and are interested
in cultivating this new land.
As for the two different "windows" through which the ethnographer
looked at the events, she found the main differences not in the
faithfulness of the resulting picture, but in the selection and order
of events as well as the presentation style: Whereas on the list the
"techies" discussed without any recognisable care for possible
observers, and many voices and interpretations could be heard in
parallel, the interview proved to remain restricted to selected
"important" topics, in hindsight events were synthesized, interpreted
and explained, reasons analysed and connections drawn. The list
discussions focused more on the "how", the interview on the "why"
aspects. So in summary both sources complemented each other.
Looking at current Internet group communication studies, my impression
is that there is a gap between two clusters of common research
designs: on one side many small case studies with in-depth analyses
(often of experimental communication settings like in classrooms), and
on the other side some big formal (structural) computer-powered
studies of lists and groups with little reference to the
content discussed. So what I find is missing, are middle to large
scale thorough content-analyses combined with computer-supported
cross sections and investigations into quantifyable list features.
As pointing in this direction I would regard Project H's larger scale
content analysis, which in this case could be achieved through hand
coding by a lot of cooperating researchers. Helpful for lower manpower
projects are approaches like those of (Fujitani/Akahori 1997, 1999),
who use computers for keyword extraction and summaries.
In order to cope with larger amounts of data within a single person
project and without access to expensive dedicated text mining
machines, my suggestion would be to put some more consideration into
tools and methods for text extraction and analysis.
Unfortunately current software for text analysis is still lacking
standards and interoperability (Alexa/Züll 1999:134), so it can be
hard to find the right combination of tools for a specialized project.
Also multilingual support cannot be taken for granted. Unicode still
needs some time to find its way into common applications, so e.g. in
the case of dealing with 2-byte code character sets in East Asian
languages sometimes again different tools are needed, and those
available are often not easy to use for the non-professional computer
user.
As one conclusion from this situation I see the need for more interdisciplinary cooperation between social scientists interested in a certain content and tool specialists e.g. from computer science, who would help to operationalise the research questions. Such a cooperation would not only contain the production of new tools for special questions, but social scientists could also learn from general paradigms and methodology in mathematics or information science.
My own project is a computer supported qualitative content anaylsis of
two German and two Japanese mailinglists. The material consists of
five years of list archives (1994-1998), there are more than 5000
E-mails, or about 30 MB of data. The list participants are mainly
school teachers who discuss the merits and problems of Internet use at
school.
My questions with respect to content mainly come from the field of
educational technology: What are the teachers' views on Internet
literacy, new roles in school education, and the challenges and
chances that learning in a globally networked context brings
about? What obstacles for Internet use at school do they observe?
Concerning formal aspects of communicaiton I look at topic careers,
communication patterns and cultural differences.
On the methodological side my aim is to find ways for efficient
extraction, coding and analysis of relevant passages from an amount of
data that is too big for getting through everything by hand (for the
first steps of exploring the field I use metaphors from archeology or
cartography). Because in the near future a lot more research material
will be available in electronic form and information overload is a
serious problem, I hope these methods will be useful not only for
extracting relevant information from mailing lists. One hypothesis is
that it should bring a substantial improvement for social scientists
to use comparatively simple and flexible software tools that meet
actual needs (my basis here is Linux, Emacs, Perl, Tk etc.).
With respect to research design I as the "domain expert" cooperate
with a "tool expert" in order to experiment with different approaches
to explore my material. Some quantitative cross-sections shall help to
find relevant passages that are then being hand-coded. The terminology
that emerges again is being prepared for further processing.
Alltogether a grounded theory like approach is used for the generation
of hypotheses and possibly theory elements.
Finally, without going into detail, I would like to introduce some
sources and tools that could be useful for the pursuit of Japan
related mailinglist studies.
As for research material there are huge lists of Internet mailinglists
available, e.g. at http://mlnews.com/jp.
Usenet newsgroups can be found under the fj.* hierarchy. There is also
a variety of discussion forums in online services like Niftyserve.
On the tool side electronic dictionnaries, word seperating and
stemming as well as indexing software can be useful for certain forms
of searches. A comprehensive list has been compiled by Baba Hajime at
http://www.kusastro.kyoto-u.ac.jp/%7Ebaba/wais/other-system.html.
There is also the possibility to write one's one scripts, e.g. in
Perl, using electronic dictionnaries or word lists.
In the area of educational technology in Japan there exist a number of
efforts for content extraction from mailing lists and other types of
electronic communication. E.g. at the Akahori lab at Tokyo Institute
of Technology
(http://www2.ak.cradle.titech.ac.jp/) S. Fujitani,
M. Ishihara and K. Akahori have developed sytems for the extraction of
topics (via keywords and key sentences) from educational mailinglists
as a service to newcomers.
At Yano lab of Tokushima University
(http://www-yano.is.tokushima-u.ac.jp) Y. Yano, H. Ogata, T. Fukui,
N. Furugori et al. also deal with electronic group communications.
For a general collection of Japanese capable software cf. the Monash
"Nihongo" archive which has a mirror in Duisburg:
ftp://ftp.uni-duisburg.de/pub/mirror/nihongo/monash/.
Alexa, Melina, Züll, Cornelia (1999): A Review of Software for Text Analysis. Mannheim: ZUMA. ftp://ftp.zuma-mannheim.de/pub/zuma/zuma-nachrichten_spezial/znspezial5.pdf
Berthold, Michael et al. (1998): It Makes Sense: Using an Autoassociative Neural Network to Explore Typicality in Computer Mediated Discussions. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 191-219.
Buck, Harald (1999): Kommunikation in elektronischen Diskussionsgruppen. NETWORX Arbeiten im Netz zum Thema Sprache und Internet Nr. 11. http://www.websprache.uni-hannover.de/networx/docs/networx-11.htm
Döring, Nicola (1999): Sozialpsychologie des Internet. Die Bedeutung des Internet für Kommunikationsprozesse, Identitäten, soziale Beziehungen und Gruppen. Göttingen: Hogrefe.
Fujitani, Satoru, Akahori, Kanji (1997): Summarized Keyword Sampling System for Mailing-list Review [in Japanese]. In: JCET 5, Sep. 11-13, 1997, Tokyo: pp 629-630.
Fujitani, Satoru, Akahori, Kanji (1999): A Summary Sentence Extraction Method for Web-based Mailing List Review Application and Its Effectiveness Study. In: Geoff Cumming, Toshio Okamoto, Louis Gomez (Eds): Advanced Research in Computers and Communications in Education, Vol. 1. Amsterdam etc: IOS Press, pp 327-334.
Hofmann, Jeanette (1998): "Let A Thousand Proposals Bloom" -
Mailinglisten als Forschungsquelle". In: Bernad Batinic et al. (Eds):
Online Research. Goettingen: Hogrefe
http://duplox.wz-berlin.de/texte/gortex/.
Ishihara, Masayoshi, Akahori, Kanji (1998): Development of a System to Generate Digests of Internet Articles for Supporting Discussions [in Japanese]. In: Nihon Kyouiku Kougaku Zasshi Vol. 22 No. 1, 1998, pp 1-12.
King, Storm (1996): Researching Internet Communities: Proposed Ethical Guidelines for the Reporting of Results. In: The Information Society 12(2), pp 119-28.
Ogata, H., Yano, Y. (1999): Combining Social Networks and Collaborative Learning in Distributed Organisations. In: Betty Collis, Ron Oliver (Eds): Proceedings of Ed-Media 1999,Vol. 1. Charlottesville: AACE, pp 119-125.
Rafaeli, Sheizaf et al. (1998): ProjectH: A Collaborative Quantitative
Study Of Computer-Mediated Communication. In: Fay Sudweeks, Margaret
McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups
on the Internet. Menlo Park: AAAI Press/MIT Press, pp 265-81
http://www.arch.usyd.edu.au/~fay/papers/techrep.html
Rafaeli, Sheizaf, Sudweeks, Fay (1998): Interactivity on the Nets. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 173-189.
Smith, Marc (1999): Invisible Crowds in Cyberspace: Measuring and Mapping the Social Structure of USENET. In: Marc Smith, Peter Kollock (Eds): Communities in Cyberspace. London: Routledge Press, pp 195-219.
Stegbauer, Christian, Rausch, Alexander (1999a): Ungleichheit in virtuellen Gemeinschaften. In: Soziale Welt '99. Zeitschrift für sozialwissenschaftliche Forschung und Praxis, Heft 1, pp 93-110.
Stegbauer, Christian (1999): Die Rolle der Lurker in Mailinglisten.
Vortrag auf ISKO'99 Hamburg, 23.-25.09.1999
http://www.bonn.iz-soz.de/wiss-org/beitraege/Stegbauer.doc.
Stegbauer, Christian, Rausch, Alexander (1999b): Fragmentierung oder
Integration - Untersuchung zur thematischen Überschneidung von Mailinglists.
Vortrag auf German Online Research (GOR '99) Nürnberg, 28.-29.10.1999
Abstract:
http://www.dgof.de/tband99/pdfs/q_z/stegbauer_ad.pdf.
Sudweeks, F., McLaughlin, M., Rafaeli, S. (Eds) (1998): Network and Netplay. Virtual Groups on the Internet. Menlo Park, Cambridge, London: AAAI Press. cf. http://www.it.murdoch.edu.au/~sudweeks/projecth/netplay.html.
Witmer, Diane, Katzman, Sandra (1998): Smile When You Say That: Graphic Accents as Gender Markers in Computer-Mediated Communication. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 3-11.
Witmer, Diane (1998): Practicing Safe Computing: Why People Engage in Risky Computer-Mediated Communication. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 127-146.