Planet Social Media Research

crowd

February 08, 2010

Data Mining: Text Mining, Visualization and Social Media

Visualizing Vultures

Driving back from the slopes on Saturday we got caught in a jam due to an accident on the 520 bridge over Lake Washington. Looking at the traffic we can see that while the crash was on the westbound lane (the top of the map), the impact on the flow of traffic on the eastbound lane due to rubbernecking was equally severe. For the record, we couldn’t find any tweets during our 30 mins in slow traffic detailing what had happened, but traditional sources and access to web cams did help.

image

by Matthew Hurst at February 08, 2010 04:34 PM

February 05, 2010

Complexity and Social Networks Blog

Our Homogeneous Social Networks

This past week I was at a workshop held at GDI on "The Social Data Revolution" with a number of executives from industry as well as entrepreneurs from around the world. The group had been explicitly chosen to gather a unique group of people to discuss and formulate key issues around the explosion of social data from online sources as well as data traces from phones and other sensors.

While the workshop was very engaging and had some interesting discussion, for me what was one of the most fascinating things was actually a discussion I had with a few of the participants over dinner.

I brought up the point that while on the face of it this group of people at the workshop seemed to be fairly diverse, with people from Asia, the US, and various countries around Europe participating, we were actually part of a very closed network and coming at the issues of the workshop from very similar perspectives. Most of us had graduated from top universities and had strong connections to academia (particularly around social media for this group), and most of us were heavily involved in the technology sector. Of the invited speakers, about of half us knew each other, and naturally the executives from the participating companies knew the other executives in their company.

This begs the question exactly how many unique perspectives were actually being brought to the table. We can think about this as sort of a more unconstrained version of groupthink, and one sharpened by the fairly flawed assumption that respected academics and executives are smarter than everyone else.

When bringing up this point at dinner, there was wide agreement that this was an important problem, and many of us have found ways to break out of these closed off communities. For example every week I take a Brazilian Ju Jitsu class where I interact with many people who come from a completely different part of society. There are electricians, construction workers, immigrants, as well as graduate students from local universities and biotech researchers.

Then a German researcher at the table talked about how they actually had a public bus driver's license, and once a month they would actually drive a public bus around their city to get that interaction with other groups of people. This prompted a long stare from a Swiss academic at the table, who murmured "I also drive a public bus a few times a year."

This was so incredible that everyone at the table burst out laughing. We were such a homogeneous group that we even mirror the obscure ways that we try to break out of this group! It's not like in Europe it's common practice for people to have bus driver's licenses on the side. This just really drove home for me the importance of branching out from your core network in many facets of your life, since we're actually so like the people that we know that we will probably attempt to branch out in very similar ways.

Of course now I wonder if all of the bus drivers on my ride to work actually moonlight as academics...

February 05, 2010 02:03 PM

February 04, 2010

Connected Action

NodeXL update: v.1.0.110 – New histograms of network metrics on Overall Metrics worksheet

In the most recent prior release of NodeXL we added new metrics that describe networks in terms of their number of components and the length of paths in those networks.  In this release we automate creation of histograms of network metrics.  It is useful to see the distribution of attributes like in-degree or betweenness to get a feel for the nature of a network.  Building a histogram in Excel is easy, but building seven (one for each of the metrics we create: degree, in-degree, out-degree, betweenness, closeness, eigenvector centrality, and clustering coefficient) is a chore.  Doing this repeatedly for several networks is too much work!  Now, when you calculate metrics in NodeXL we will create these charts for you and place them on the Overall metrics worksheet.

We will add axis markings and titles soon, making these charts ready to use in a variety of network reports.  These histograms will also appear in the Dynamic Filters dialog to guide users as they select segments of the distribution to include or filter out of the displayed network graph.

Other updates:

1.0.1.110 (2010-02-03)

  • The Overall Metrics worksheet now includes more information about the degree, in-degree, out-degree, betweenness centrality, closeness centrality, eigenvector centrality, and clustering coefficient metrics when those metrics are computed. The additional information includes the minimum, maximum, average, and median metric values, and a histogram showing the metric value distribution.
  • The “Convert Old Workbook” item on the NodeXL, Data, Import menu in the Ribbon is now called “Import from NodeXL Workbook Created on Another Computer.” This menu item can be used to work around the following problem: NodeXL workbooks created on a 64-bit Windows computer cannot be opened directly in Excel on a 32-bit Windows computer, and vice-versa. (If you attempt to do so, you will get an error message whose details include “could not find a part of the path.”)
  • A Clear All Worksheet Columns Now button has been added to the Autofill Columns dialog box (NodeXL, Visual Properties, Autofill). Also, you can now clear an individual worksheet column by clicking a button in the dialog box’s Options column.
  • Bug fix: On large-font machines, the buttons at the bottom of the Autofill Columns dialog box didn’t fit within the dialog box.
  • Bug fix: In some circumstances, vertices were drawn below the bottom of the graph pane and were impossible to see. One such circumstance was when the selection was exported to a new workbook (NodeXL, Data, Export, Selection to New NodeXL Workbook). The graph pane in the new workbook acted as if it were taller than its real height, leading to vertices dropping off the bottom.
Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at February 04, 2010 10:42 PM

February 03, 2010

Data Mining: Text Mining, Visualization and Social Media

Tree Map Visualization of Budget

The New York Times has an interesting visualization of the budget proposal. While this gets off to a good start, it doesn’t really deliver due to the lack of true zooming. When you zoom into a portion of the image, the labels detailing the new context are missing.

image

by Matthew Hurst at February 03, 2010 03:12 PM

UMBC Ebiquity

Kaggle aims to host data-driven machine learning competitions

Kaggle is a site for data-related competitions in machine learning, statistics and econometrics. Companies, researchers, government and other organizations will be able to post their modeling problems and invite researchers to compete to produce the best solutions. The Kaggle demo site currently has three example competitions to illustrate how it will work and expects to host the first real one in March. Kaggle’s competition hosting service will be free, but the site says that it plans to “offer paid-for services in addition to its free competition hosting.”

by Tim Finin at February 03, 2010 11:36 AM

Data Mining: Text Mining, Visualization and Social Media

Search and Social Media 2010

Tomorrow I will be attending Search and Social Media (at WSDM) - and am very much looking forward to it. While thinking about the workshop I realised that the title can lead to endless debates around the definition of 'search' and 'social media', so I think it makes a lot of sense to break things down by aspects of the social media space. In particular: search and real time content, search and community, search and influence, authority and popularity, search and geolocated content, search and social networks, etc.

When breaking things down in this manner, we can also reflect on the notion of 'search'. There are many interactions with social media that currently don't follow the traditional search dialogue. TopicFire and Techmeme, for example, are more focused on real time analytics than on search as a primary mode of interaction. Similarly, Bing, Google, Yelp and many others provide a 'what's nearby' interaction which uses context to parameterize content in the spatial dimension, much as TopicFire and Techmeme use time (what's nearby versus what's rightnow).

by Matthew Hurst at February 03, 2010 01:17 AM

January 31, 2010

Data Mining: Text Mining, Visualization and Social Media

Social Psychologists in Las Vegas

I've just returned from a brief trip to Las Vegas where I had been invited by Sam Gosling and Kate Niederhoffer to participate in a panel at the annual meeting of the Society for Personality and Social Psychology. I was very happy to participate with Tom Lento from Facebook and Winter Mason from Yahoo Research. The panel topic was Forging connections between Social Media and Social/Personality Psychology and was intended to help bridge the gap between researchers in the social-psych field and those of us in industry working with social and personal data.

Perhaps the most salient point that came out of the discussion was the difference between current social-psych methods and the data scale of the online social world. To help understand this new world, and for social-psych researchers to make the transition, an ability to code and to deal with large amounts of data is required. There was genuine interest in this (provoking questions about the specifics we had in mind, such as working with scripting languages) and I believe a real interest in the opportunity that lies in this data rich online world.

In terms of facilitating interactions, Winter probed the audience regarding knowledge of key conferences for web data analytics, etc. such as WWW and ICWSM. I consider the later to be a perfect forum for this discussion to continue, and with Sam's help I think this can really happen.

The conference itself is run in a manner quite different to that which computer scientists are used to. No papers are submitted and acceptance (which runs high) is based on short abstracts (generally paragraphs). I managed to see one of the poster sessions and was fascinated by the topics (extremely focused accounts of certain behaviours and traits) and the methodologies.

I feel that there is a lot to learn and the the ultimate winners in the social space will be those companies farsighted enough to really embrace this field and who figure out how to integrate such insights about users into their products.

by Matthew Hurst at January 31, 2010 06:57 AM

January 30, 2010

UMBC Ebiquity

UMBC global game jam live video feed

Via Marc Olano: The Global Game Jam is into its second day at UMBC with 41 registered participants working on seven games. Keep up from home with our live video feed and games list.

by Tim Finin at January 30, 2010 11:45 PM

Connected Action

Book in progress: “Analyzing Social Media Networks with NodeXL: Insights from a Connected World”

2009 - November - Morgan Kaufmann Logo

Along with Professors Ben Shneiderman (Computer Science/Human Computer Interaction Lab) and Derek Hansen (College of Information Studies) from the University of Maryland I am writing and editing a book about analyzing the social media networks that form whenever people link or reply to one another, favorite, rate, read, or edit data about other people or their objects.  Social media networks can be analyzed using the methods of social network analysis, the mathematical application of graph and network theory to the social sciences.  Using social network analysis collections of connections can be analyzed and compared to identify key people and groups and measure changes over time and following interventions.

2009 - December - Elsevier Logo

I am pleased to announce that we have signed with Elsevier/ Morgan Kaufmann to produce a book: Analyzing Social Media Networks with NodeXL: Insights from a Connected World for a Summer 2010 delivery!

2009 - October - NodeXL Facebook Network Marc Smith

A map of the relationships among the population of people who all tweet a particular keyword can lead to the discovery of the key hubs and influential people in the network.  A social network analysis of reply patterns in email collections displays clusters around projects and highlights key people and relationships.  Visualizing the connections among your friends in Facebook can reveal the various life stages and communities in which you have participated.  When you chart the links between videos and users in YouTube content with interesting network properties is exposed based on well connected content creators and influential commentators.  A graph of  the individual connections between flickr users illustrates the emergent formation of groups around social networks, locations, and topics.

These kinds of social media network data collection, scrubbing, analysis, and display tasks have historically required a remarkable collection of tools and skills.  A great example of the variety of tools that can be used in concert to extract, analyze and display social media networks can be found on Drew Conway’s blog.  This is a powerful set of tools for those who can master the demands of python and API interfaces.  In contrast, the approach the NodeXL project has taken is to provide an end-user GUI application environment built within the framework of Excel 2007 for performing basic social media network analysis and visualization for non-programmers.  The python path is certainly the high road for experts and those with demanding volumes or esoteric data requirements.  But for the non-coding user, NodeXL may be one of the easiest ways to both manipulate network graphs and get graphs from a variety of social media sources.

There are already some materials available to guide new users interested in learning about NodeXL, social networks, and social media.  A video tutorial for NodeXL demonstrates the extraction of the network of people in twitter who mentioned the term “digg”.  A tutorial guide to NodeXL offers a step by step guide to features in the NodeXL toolkit (with supporting data sets).  But the book will capture the theory, history, domain and process of social media network analysis in a single volume.

The volume contains a broad introduction to social media, social networks and the operation of the NodeXL application and then features a series of  chapters from leading researchers that focus on a particular social media system (email, Facebook, Twitter, YouTube, flickr, Wikis, the WWW hyperlink network) and the networks each contains (replies, friends, follows, subscribes, comments, favorites, edits, links, etc).   A final chapter outlines a programmer’s view of the NodeXL code, in contrast to the code-free approach of the remainder of the book.

Our intended audience is the mostly non-programming population that is interested in social media and the techniques of social network analysis.  The volume is largely in the form of a how-to guide that readers can follow and replicate all examples.  Using your own free and open copy of NodeXL, you will be able to use sample data sets or create similar live queries that map relationships in social media systems.

We have an ambitious production schedule so the book may be on a book store shelf or online retailer search result in summer 2010.

Table of contents…

1. Introduction
2. Social media
3. Social Network Analysis
4. Hands on SNA: Learning by doing – Network Layout
5. Hands on SNA: Learning by doing – Network Metrics
6. Hands on SNA: Learning by doing – Network Filtering
7. Hands on SNA: Learning by doing – Network Clustering
8. Email
9. Lists, message boards, and communities
10. Twitter: Scott Golder and Vladimir Barash,Cornell University
11. Facebook: Bernie Hogan, Oxford Internet Institute
12. WWW: Robert Ackland, Australian National University
13. flickr: Eduarda Mendes-Rodriguez and Natasa Milic-Frayling, University of Porto and Microsoft Research, Cambridge
14. YouTube: Jen Golbeck and Dana Rotman, University of Maryland
15. Wikipedia: Ted Welser, Patrick Underwood, Dan Cosley, Derek Hansen, and Laura Black, Ohio University, Cornell University, and University of Maryland
16. NodeXL for programmers: Tony Capone, Microsoft Research
Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at January 30, 2010 03:00 PM

Joseph Reagle on Wikipedia

Punditry and The Web 2.0 debate

I've been following the discussion at the Web 2.0 Forum with interest. In summary, Michael Gorman of Encyclopaedia Britannica complains that Web 2.0, and it supposed champion Wikipedia, is a "digital tsunami" (Gorman 2007jer) threatening education, scholarship, and the underlying values of Western civilization (Gorman 2007ssi1). Yet, while I follow the discussion with interest, I actually don't find it substantively engaging. Many of the arguments, particularly Gorman's, tend to be characterized by unsubstantiated claims and the purposeful construal of nuanced issues as extremes -- propping up strawmen for subsequent potshots. As I've already indicated, while it might bring pundits a sense of righteousness and attention, in the end "Time, not arguments, will utlimately tell." (And, for this reason I appreciate Larry Sanger's continuing efforts to implement his vision.)

Why, then, do I find this discussion of interest? Punditry, communicative disorders, and history. First, I'm trying to come to an understanding of "punditry," and I think Gorman's recent bloggings is an exemplar. My sense is that sometimes people argue for arguments' sake. That is, even if they genuinely believe the thing they are arguing for, attention, not persuasion, is the goal. (In a sense, perhaps it is a high-brow, and perhaps more genuinely held, form of trolling -- another interesting phenomenon.) Second, I'm interested in communicative disorders. For example, Gorman faults Wales for allegedly saying "If you can't google it, it doesn't exist." (This quotation was originally unsourced, challenged by Wales, sourced by Gorman, and the disagreement continued.) But earlier in the same essay, Gorman himself complains "More solid and reputable websites are buried by the current algorithms of the Internet because they are often fee-based and cannot garner as many links as free sites (links are key to boosting one's search engine rank)" (Gorman 2007ssi1). If Wales made such a claim, I would expect it would likely have been a descriptive statement, rather than normative. That is, this is largely the way it is, versus the way it should be. And, this is essentially the same thing Gorman notes, and laments, above: if one's content is not freely accessible it is "buried." And I can't imagine anyone claiming that all nondigital information should be dismissed out of hand, and perish from the earth. The real issue is the normative response one should make in light of the "Google description": make information freely accessible, or enable Google to index proprietary sources, and even nondigital media (e.g., old books). This, to me, is an interesting question, something which is happening today, and something I would like to learn more about. For example, what kind of arrangements does Google make with fee-based sites to index or content? (JStor articles often are a prominent results in my Google queries.) What is the user response to a search result which is not immediately available to them? By purposeful misunderstanding punditry confuses genuine grounds for agreement and disagreement, and possible understanding.

My final reason for my interest is because of the ways in which this debate parallels similar discussions throughout history. That's right, as I argue elsewhere on this site, and in my forthcoming dissertation, reference works frequently act as a flashpoint and focus for larger social anxieties about change. I suspect my argument here is a consequence of a historical sensibility: what most people see as an extraordinary shift appears, with perspective, to be one thread in a larger tapestry. For example, Gorman's concern with plagiarism is ahistorical. Again, elsewhere I argue that the history of reference works is a history of plagiarism. When the Britannica was in its infancy, much like the Wikipedia today, its founding editor admitted he "made a Dictionary of Arts and Sciences with a pair of scissors, clipping out from various books a quantum sufficit of matter for the printer." (Yeo 2001:180) I'm not condoning plagiarism, but I find if we want to make normative statements about how things should be, it is best to genuinely understand the way things are, and have been in the past. This seems to be a difficult task in the midst of punditry, without some level of scholarly remove or Neutral Point of View -- as Wikipedians say.

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

Civility without neutrality

The Neutral Point of View (NPOV) policy was much discussed at this month's Wikimania 2006. There was, of course, my own presentation asking "Is Wikipedia neutral?," but I am not alone in appreciating the importance and value of this notion. However, two other discussion during the conference made me think that perhaps neutrality is sometimes overvalued.

The first issue was whether neutrality should be a policy on all Wikis? Many think not, and my own response was that Ward Cunningham's Wiki -- the first -- was not neutral: it advocated for a particular type of software development practice. Such "perspective making" (Boland and Tenkasi 1995) within a community is an important function.

The second issue is whether there is something we can learn from neutrality without having to actually be neutral? Indeed, there is: civility. During the conference I remarked to a colleague that neutrality and civility are often conflated because neutrality roughly necessitates civility. But that does not mean that absent the neutrality requirement, we must be rude. Kingwell (1995:247) makes an interesting argument that in a pluralistic society it is too much to ask that we have "genuine respect" for everyone. Civility only asks that we (initially) treat others as if they were worthy of respect and understanding. This notion, in the Wikipedia lexicon, is that of "good faith" an often connected but distinct and separable notion from the neutral point of view.

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

Nature's Wikipedia and Encyclopedia Britannica Analysis

Those interested in Wikipedia are discussing the comparison of errors appearing in a sample of articles, reported in by Nature, of 42 article. While I agree with Jakob Voss's comments on the limitations of the study, for this sample the amount of errors does seem roughly comparable with Wikipedia -- hopefully that Wikipedia outlier for Dmitri Mendeleyev will be fixed soon. I was further intrigued to note that the errors per topic correlate between the two:

WP v EB

This is a strong correlation (r=0.574) implying perhaps a similarity in the difficulty of writing on that topic, or perhaps a difference in scrutiny by the experts (e.g., the person reviewing the Cambrian explosion is picky!).

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

Magnus and Sanger on Expertise

In my previous entry I commented on one of the articles in the Episteme Wikipedia issue. I thought it would also share my comments on the other two articles that were of interest to me. I read both of these under the influence of Collins and Evans (2008, hereafter "CE:"), which I have also mentioned here.

First, Larry Sanger's piece on The Fate of Expertise after Wikipedia is composed in two parts. First, the author responds to various interpretations of what he calls "The Wikipedia Potential Thesis" (WTP) whereby if Wikipedia fulfills its highest potential in terms of measurable quality, "experts would not need to be granted positions of special authority in order for humanity to have a resource that accurately tracks expert opinion." I think this is a bizarre thesis that no one has actually put forward. After some philosophizing, and given that Wikipedia is dependent upon expert (CE:contributory) knowledge, Sanger concludes this thesis is untenable. I agree. While Wikipedia might be sufficient in providing EC:interactional expertise (knowledge of -- not ability to do -- science) and might threaten other interactional experts (i.e., journalists) it would not obviate EC:contributory expertise. He also argues that Wikipedia is successful not because of anonymity, but because of its freedom -- permitting him to claim Citizendium is just as wiki-like and powerful as Wikipedia, but better in that real-name identities support community, governance, and quality. This is an argument he's made before, and one I largely agree with. Had Wikipedia started with the requirement that people login with an identity that corresponds to some real-world identity -- and this only need be policed in cases of abuse -- I think it would've done just fine.

Second, I most enjoyed P.D. Magnus' On Trusting Wikipedia. After reviewing literature on the reliability of Wikipedia, and arguing that Wikipedia is not like Britannica, the author posits five means by which reliability might be ascertained. The first three means correspond to types of meta-expertise in Evans and Collins: authority (reliable source; EC:local discrimination), plausibility of style (EC:technical connoisseurship), and plausibility of content (EC:ubiquitous discrimination). The second two have no direct corresponding type in Evans and Collins: calibration (testing a subset of the authors claims), and sampling (testing single claim with another expert, i.e., a second opinion). The author concludes that in the case of Wikipedia, none of these indicators are particularly strong. But I find his fault with authority (i.e., check your sources implied by WP:Verifiability) rather weak; he argues sources are unreliable, as are Wikipedia articles, since they are dynamic and can change. That is why one should use the permanent link (dated and versioned) when referring to something on the Web.

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

The elusive Jimmy Wales

Like others, I have been surprised that the Wikipedia policies of No Original Research (WP:NOR, Wikipedia 2006nor) and Verifiability (WP:V, Wikipedia 2006v) had been collapsed into a new policy of Attribution (WP:ATT, Wikipedia 2007a). The two former policies, in addition to Neutral Point of View (WP:NPOV, Wikipedia 2006npv), have been essential for understanding and explaining Wikipedia collaborative culture -- even more so than the Trifecta (Wikipedia 2006pt) and The Five Pillars (Wikipedia 2006fp).

But Wikipedia is huge, and it's not hard to miss something even as important as this, so last week I updated my dissertation to read:

The second policy of Attribution requires, in a nutshell, that "All material in Wikipedia must be attributable to a reliable, published source" (Wikipedia 2007a). In a manner of speaking, this second policy is relatively new -- becoming "official" in February 2007 -- because it incorporates and supersedes two the long-standing policies of No Original Research (WP:NOR, Wikipedia 2006nor) and Verifiability (WP:V, Wikipedia 2006v).

Yet, for the dissertation, I concluded that having these two ideas remain distinct was useful to me in my writing and I would continue referring to them even if it now required a parenthetical comment about their merger. Evidently, Wales (2007wvw) agreed, as he wrote yesterday:

The change was made before a sufficient process had taken place to make the change, with the result that many good editors were unaware that such a fundamental change was about to take place. Many have reported being baffled and unhappy with the change.

However, because he intervened with a "rejection of [[WP:ATT]]" (Wales 2007jrw) this prompted two threads of interest to me: to what extent does WP:NOR act as a proxy for Notability, and "Just what is Jimbo's role anyway?" (Bennett 2007). In writing about leadership in Wikipedia and other open content communities, I have wondered the same, and now some were pressing for an explicit enumeration of Jimbo's powers. Others engaged in the perennial question of is this role more like that of a dictatorship, ministership, presidency, or a monarchy? Stephen Bain (2007tdv) has posted a thoughtful argument that constitutional monarchy is the most apt, something Wales himself has said in the past:

But we have retained a 'constitutional monarchy' in our system and the main reason for it is too support and make possible a very open system in which policy is set organically by the community and democratic processes and institutions emerge over a long period of experimentation and consensus-building. No one needs to be afraid that VfD will be hijacked, and our rules turned against us. (Wales2005 nnw1)

And this brings me, finally, to the point of this essay: the elusive Jimmy Wales. I am not sure if this is a feature of other auctorial leaders (e.g., Linus Torvalds, Guido von Rossum, Larry Wall, etc.) but it is a sometimes frustrating and seemingly useful characteristic of Wales who commented on the ambiguity of his role as follows:

I think the limits on my power are quite a bit unknown for a few reasons, mainly that I really don't exercise power all that much, ever, and so most questions of what I could do just simply don't come up. And passing a priori laws against me seems rather injudicious since our community institutions are all quite carefully limited for good reasons in an effort to create an atmosphere of calm loving respect. (Wales 2007jwi1)

This ambiguity is one reason why I find the statist notions somewhat inappropriate and place my theory of leadership in the lineage of emergent leadership (Yoo and Alavi2003).

Wales has followed this strategy from the start, once characterized, to the frustration of Wikipedia cofounder Larry Sanger (2005), as a good cop rarely wielding a big stick:

In retrospect, I wish I had taken Teddy Roosevelt's advice: "Speak softly and carry a big stick. Since my "stick" was very small, I suppose I felt compelled to "speak loudly," which I regret. (This was not such a problem, by the way, on Nupedia; partly, that was because there were not nearly as many problem users on Nupedia, but partly it was because there was clear enforcement authority.) As it turns out, it was Jimmy who spoke softly and carried the big stick; he first exercised "enforcement authority." Since he was relatively silent throughout these controversies, he was the "good cop," and I was the "bad cop": that, in fact, is precisely how he (privately) described our relationship. Eventually, I became sick of this arrangement. Because Jimmy had remained relatively toward the background in the early days of the project, and showed that he was willing to exercise enforcement authority upon occasion, he was never so ripe for attack as I was.

This elliptical approach serves Wales well and I think it is appropriate to the many challenges he faces, but it is also a challenge for writing about Wikipedia. For example, in explaining an inspiration for Wikipedia, Wales (2005nt) has acknowledged the debt to Hayek's, "'On the Use of Knowledge in Society' as a pivotal essay in guiding my own thinking on topics like decentralization, knowledge, and society." But Wales has also resisted (Wales 2005wew) the idea that The Wisdom of Crowds (Surowiecki 2004) is a factor in Wikipedia dynamics. This isn't trivial for me to reconcile and I can only do so by reading the latter statement not as a disavowal of social emergence, but a purposeful shift in focus to the community and culture of Wikipedia. (Since that, too, is my own belief: there are underlying emergent dynamics, but don't forget the collaborative culture!)

Or, consider that Wales (2005wdm) vehemently disagreed with Seigenthaler's claim (AP 2005) that because Wikipedia allows anyone to edit, Wikipedia permits vandalism. Yet, six months later, in order to limit such vandalism Wales argued for what was popularly understood as a new blocking feature, so those logged in from an otherwise blocked IP address could still edit. Wales (Wales 2006nyt1) argued this was not a restriction:

Openness refers not only to the number of people who can edit, but a holistic assessment of the entire process.I like processes that cut out mindless troll vandalism while allowing people of diverse opinions to still edit. Those are much better than full locking.

"Holistic," "elliptical," "elusive"... and a challenge in writing about Wales!

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

Can you trust the Wikipedia?

In the past week the perennial question of "Can you trust the Wikipedia?" arose while I was working on the tedious -- though oddly compelling for an obsessive like myself -- task of reviewing the early period of Wikipedia history. I slowly worked through the Wikipedia timeline ensuring each event was dated and sourced. I realize that if I'm ever to trust this timeline, I need more than a bald claim. And, my appreciation is so much greater when I can peruse the primary source. For some sources, such as the Nupedia list archives, I was able to find copies of messages on the Internet Archive. Another source, Jimbo's explanation about Stallman's proposal for a competing project, is seemingly lost forever. Fortunately, Stallman was kind enough to tell me of his recollection of the incident and allow me to publish it. Most frustratingly, I encountered a tantalizing mention of Internet encyclopedia proposals from the UN's Millennium Project but failed to find any source or corroboration; that information is stricken from the article. Which brings me back to the question of trusting the Wikipedia. I have addressed the broader question of epistemological authority before, but now I want to focus on the role of sources.

Simply, Wikipedia is only as trustworthy as its links. Actual scholarly authority is similar. A critical part of scholarly training is learning why and how to cite (link to) others. Expert authority is also generated from experience in the field, and theoretical and methodological training. Yet, as I've noted many times "'We can never know everything.' We all can't be experts on everything, so we often need to rely upon credible authority while remaining critical and skeptical, but never dismissive." Consequently, the tokens "Ph.D." and "professor" become proxies for an assessment of trust that very few people are able to substantively test, but, to which many are willing to defer. Because Wikipedia lacks such reputation mechanisms Wikipedia is, again, only as trustworthy as its links. For educational purposes, the implication of this is profound. Should we teach students to trust a claim because it was simply uttered by a credentialed person? Or, should we encourage them to click a link and teach them how to investigate for themselves?

The consequent of this for Wikipedia culture is that it doesn't link enough. Perhaps my experience with Wikipedia history is exceptional since Wikipedians take the sources for granted. But, as I found, that's a poor historical assumption. I also share the concern that articles might become overly busy or dense with citations. There is a tension here, but one I think the technology can handle. It's why I believe the trustworthiness of Wikipedia is in part dependent upon the citation project and furthering a culture of "if you claim, you cite" as implied by the Verifiability policy.

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

Britannica love

britannica loveIn Harvey Einbinder's excellent "The Myth of the Britannica" he includes some of the advertisements used to sell Britannica around 1960 including this one: "HOW CAN YOU EXPRESS THE INEXPRESSIBLE LOVE YOU FEEL FOR YOUR CHILD?" The actual copy, contributed to Dr. D. Alan Walter is not by an eminent child psychologist or educator, but one of the Britannica's salespersons.

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

Auctorial Leadership?

A few days ago, while walking home from the local library, I recalled an expression I learned in a class on early Christian history: primus inter pares. This notion was used by early church leaders (e.g., the Bishop of Rome, now the Pope) and present day patriarchs to indicate a status of "first among equals." Perhaps this could help me with my question of what to call benevolent dictatorship in open content communities. But, the sentiment wasn't quite right and it would be difficult to coin a term out of that Latin expression. But as I followed links from the primus page I encountered the terms "patriarch," "ethnarch," "archons" and finally "auctoritas."

The Oxford Classical Dictionary defines patrum auctoritas as: "the assent given by the 'fathers' (patres) to decisions of the Roman popular assemblies. The nature of this assent is unclear, but it may have been a matter of confirming that the people's decision contained no technical or religious flaws. The 'fathers' in question were probably only the patrician senators, not the whole senate..." (Momigliano and Cornell 2003). Auctoritas is the Latin root of English words authority and author. Given that "benevolent dictators" are often the founding author of open content projects, it seems appropriate. (In the Internet standards context, I spoke of "elders.") While I was convinced for a time my term would include the root "arch," for "ruler," the more I read of auctoritas the more I liked it.

Additionally, the form of power inherent in auctoritas fits my notion of leadership. It is not a coercive order but a recommendation with a normative force based on the prestige and charisma of a leader. Theodore Mommsen wrote of it as a force that is "more than an advice and less than an order: it is an advice whose compliance it is not easy to evade..." (Mommsen, as cited in Lottieri 2005:15). Lottiere's concludes his discussion of the notion by writing:

For all these reasons we can say that auctoritas wa[s] on the edge between the legal world and the social life, the beliefs, the customs. It is in condition to influence the decisions by its prestige. Therefore, people refusing the auctoritas can ignore it, but they know that by the decision they are out of the community. (Lottieri 2005:15).

And this dovetails into the possibility of forking!

So, I find the term to be a surprisingly good fit. Now I need to figure out how to pronounce "auctorical" or "auctorial," or maybe even "authorial," leadership. Is this too awkward?

January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

Wikipedia and Astroturfing

Clay Shirky notes a cycle of references created by a few people with the effect of promotion: a Wikipedia article for Symphony OS is referenced in a Slashdot article, which is then noted in the Wikipedia article:

This is an interesting kind of spam, or maybe we could call it a reputation hack.... They create a Wikipedia page, point to it as if to demonstrate independent interest for the project in their potential slashdot post, then point to the slashdot effect on the Wikipedia page as proof of said independent interest. Voila, an instant trend.

The Symphony Talk page reminds me very much of one of the Lamest Edit Wars Ever over very similar issues with SkyOS: "Fast & furious kindergarten catfight with accusations of GPL violations, advertising, lying and fanboyism." One difference is that Symphony is actually Free Software, so while there is an argument about advertising, implied dishonesty and fanboyism, the GPL hasn't been an issue -- yet! (Accusaion of GPL violation sometimes strikes me as similar in some sense to Godwin's Law; while license violations may be a substantive accusation, the discourse has no doubt gotten heated by then.)

Shirky also thinks that referencing the consequent Slashdot Effect on the Symphony OS site doesn't merit inclusion. (Personally, I don't mind and I don't read the Slashdot Effect as a reciprocative authority.) After Shirky removed it, EliasAlucard reverts the removal commenting "Why is trivia being removed by that anon user 'Clay Shirky?' As far as I'm concerned, he has nothing but distaste for this article, and his edits shouldn't be reckoned with." Unfortunately, here and on the Symphony Talk page EliasAlucard is not representing himself -- nor the article -- well and is failing to uphold numerous Wikipedia norms of good faith and writing for the enemy. (In Wikipedia, we encourage folks to try to see the perspective of the other, not write them off.) Also, the Slashdot effect claim is without  attribution and citation of evidence. So I've included that link at least.



January 30, 2010 03:54 AM

Joseph Reagle on Wikipedia

English Wikipedia's Three Millionth Article

There's a tradition at Wikipedia of predicting when a particular milestone will be reached. Earlier expectations about Wikipedia were laughably conservative. While people have been guessing topics (rather random), there's sadly no page for the three millionth article. Given that since May the English Wikipedia has been increasing at a rate of ~1,300 a day, I'm expecting Wikipedia will hit this milestone in one week!

January 30, 2010 03:54 AM

January 29, 2010

Connected Action

Path and Component Metrics, new in NodeXL v.1.0.1.109

NodeXL has updated again (v.1.0.1.109) with new network metrics.  The application now calculates path length data for your network, reporting the Maximum Geodesic Distance and the Average Geodesic Distance.  The list of overall metrics NodeXL creates includes: Vertices (the number of nodes in the graph), Unique Edges, Edges With Duplicates, Total Edges, Self-Loops (Edges that point back at the node from which they originate), Connected Components (each set of connected nodes that are not connected to another set of nodes), Single-Vertex Connected Components (all the “singletons” of just one node in a component), Maximum Vertices in a Connected Component (the size of the “Giant” component), Maximum Edges in a Connected Component (the density of the “Giant” component), Maximum Geodesic Distance (Diameter) (the longest path that can be uniquely walked through the graph), Average Geodesic Distance (the average distance between two nodes in the graph (compare this to the “six degrees” standard), Graph Density (the density of the complete network).

More metrics and details on existing metrics are on the way!

What metrics do you need?

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at January 29, 2010 09:23 PM

Connected Action

Talk at Israel Internet Association on February 22, 2010

2009 - December - isoc_logo2009 - December - isoc logo

The Annual Meeting of the Israel Internet Association (http://www.isoc.org.il (English)) is being held February 22-23 2010. I will be speaking at this year’s meeting: http://www.isoc.org.il/conf2010/agenda.php?lang=en

The previous year’s conference website is at: http://www.isoc.org.il/conf2009/program.php

The Israel Internet Association is the official Israeli Chapter of the Internet Society.  Their annual meeting is a central event of academics (sociologists, psychologists, business and law) as well as industry participants from sectors including mobile cellular companies and internet service suppliers.

My talk title: Analyzing Internet social media: visualizing social networks in (mobile) computer networks
Abstract: Social media systems on the Internet are sociologically interesting: why do some online groups succeed where others fail?  How do different collections of online media and populations of authors differ from one another?  How do patterns of contribution vary and how do these differences illustrate the roles people play within their communities?  Several visualizations of patterns of contribution and connection in a range ofInternet social media including web boards, enterprise social networks services, and personal email are presented to illustrate the range of variation among social media repositories and between types of contributors.  These images suggest that a more comprehensive overview of social media can generate sociologically relevant findings, improve community management tasks as well as provide features that can improve search and ranking of user generated content.  A freely available tool, NodeXL, will be demonstrated to perform basic social media analysis tasks.  Extending these tools to include mobile social software (“mososo”) data sets is a major new direction.   In the not too distant future, mobile devices will possess a range of sensors and become more “socially aware”.  When phones routinely notice each other the nature of social interaction will change dramatically.  How will places and locations change when machines become socially aware?  In this talk, sociologist Marc Smith, Chief Social Scientist for Connected Action Consulting Group, a provider of social media analysis platforms and services, will describe these new technologies and some ways of thinking about their implications.
Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at January 29, 2010 09:00 PM

Complexity and Social Networks Blog

Army Research Lab Funds Network Science, Big Time.



Army Research Lab invests in network science research



ARL logo.jpg



(The complete list of everyone associated with any of the four centers can be found here. Many, many well-known network scientists are part of this immense project. I am flattered to be part of the group. I think amazing things will be done with this funding. What follows is an ARL press release about the project. SW)

Dec 11, 2009

By Sarah Maxwell (Army Research Laboratory)



The Army Research Laboratory will be investing up to $166 million over the next five years to bring government, industry and academic institutions together to advance the Army's network capabilities.

Focusing on the new and growing area of network science, ARL officials announced late September that they are awarding the money to a consortium of institutions to create four centers to execute research in the information, social-cognitive and communications network areas.

"This is the first project looking at the social interaction, information distribution and mobile ad-hoc network as a whole," said Dr. Jay Gowens, Computational and Information Sciences Directorate director.

Bringing these three areas together will allow researchers a much more comprehensive understanding of network science, said Gowens.

The ultimate goal is to develop a scientific foundation for modeling, designing, analyzing and predicting the behavior of very large networks of humans interacting with each other, said Gowens.

BBN Technologies will focus on network integration; University of Illinois at Urbana-Champaign will research information; Pennsylvania State University will explore communications; and Rensselaer Polytechnic Institute will delve into social-cognitive network research.

Each lead organization will work with additional partners, and the overall focus is expected to substantially enhance the future warfighter's network communication capabilities, according to Gowens.

"The Army is moving rapidly and ever-deeper into a network-centric world. So much now depends on how warfighters and sensors and weapons communicate information through mobile, self-forming, rapidly-changing networks," said Dr. Alexander Kott, ARL's Network Science Division chief, who manages the alliance. "Here, we see the same three intertwined types of networks: social-cognitive (warfighters), battlefield information, and communication nets."

Network science is a burgeoning field that is still very young and requires much more research to understand how to apply it most effectively, said Kott.

"It was only a few years ago that scientists realized that networks of all kinds--biological, social, computer--are in a unique class of creatures, which live their own mysterious lives," said Kott. "They evolve, change, behave in little-known ways and all this is very important to understand and to study."

ARL received eighteen proposals and selected four of them for the award because they provided the best value to the government, said Army Research Office's Patricia Fox, chief contracting officer on the project.

The NS CTA is just a part of ARL's comprehensive network science research that incorporates both new and existing ARL research activities, blending them into a coherent program, said Gowens.

ARL's existing programs include the Mobile Network Modeling Institute and the Network and Information Science International Technology Alliance. Other programs being developed include the Network Science and Technology Research Center, the Cognitive and Neuroergonomics Collaborative Technology Alliance.

January 29, 2010 07:28 PM

Data Mining: Text Mining, Visualization and Social Media

Malaysian Blogosphere Division

(Briefly) I've been working with some new data that our team has produced and created the view below. What I find striking about this visualization of 6k blogs is the clear division between two major clusters. A very limited drill down on the data suggests that all of these blogs are Malaysian in origin (and most are on Google's Blogspot). I don't yet have enough insight into this component to understand why there is a split - perhaps more to follow.

Blogosphere-edges

by Matthew Hurst at January 29, 2010 06:03 PM

Data Mining: Text Mining, Visualization and Social Media

A Different Way to Think About Apps

I use an app to check on the skiing conditions at local slopes. I click on the icon on the iPhone, the app pops up and I see some data. When I'm on a desktop, I do exactly the same thing, except the app I click on is a web page. While Apple claims 140, 000+ apps available for their phone, and others in the space do the same, a good number of these apps are really just thin clients backed by the same sort of data that usually goes to populating a web page.

This prompts an obvious question: how many 'apps' are really just thin clients backed by web servers similar to those for traditional browsers? An initial answer to this might be constructed out of a break down of app categories (games, for example, are less likely to fit this model).

In addition, as we hear more and more about crawling the deep web, what are the rules of engagement for crawling the data services that back these thin client, browser-like apps?

by Matthew Hurst at January 29, 2010 01:30 AM

January 28, 2010

UMBC Ebiquity

Global Game Jam at UMBC, January 29-31

UMBC will be the Baltimore site for the Global Game Jam. This is a 48 hour event, where teams from around the globe will work to each develop a complete game over one weekend. Last year, the UMBC site fielded five teams as one of 54 sites in 23 countries. This year promises to be even bigger, with 124 sites in 34 countries.

The Baltimore site and open to participants at all skill levels. It is not necessary to be a UMBC student to register. Thanks to generous support by Next Century , there is no registration fee for the Baltimore site, but you must register for this site in advance at www.globalgamejam.org. The jam will start at 5PM on Friday, January 29th in the UMBC GAIM lab, room 005a in the ITE building. At that time, the theme for this year’s games will be announced, and we’ll brainstorm game ideas and form into teams. Teams will have until 3pm on Sunday, January 31st to develop their games. We’ll have demos of each game and selection of local awards, wrapping up by 5pm Sunday.

Last year’s theme was “As long as we’re together there will always be problems”, and we had games developed using a combination of XNA, Flash, Maya, Photoshop, and the Unity Engine.

For more information, visit http://gaim.umbc.edu/jam/.

by Tim Finin at January 28, 2010 08:31 PM

Data Mining: Text Mining, Visualization and Social Media

Google Updates Whistler Aerial Imagery with Snow

I just noticed that Google Map's aerial images of Whistler (hosting the 2010 Winter Olympics) is now nice and white. This is definite a change (and other ski locations I visited have not been updated with winter data).

Whistler

by Matthew Hurst at January 28, 2010 06:07 PM

Data Mining: Text Mining, Visualization and Social Media

Visualizing Network Evolution

New to me is this work by Bergstrom and Rosvall on discoverying and visualizing changes over time in networks. The basic idea is to track nodes as the join and migrate between clusters in a network.

Aluviall

Some additional visualization are available here.

 

by Matthew Hurst at January 28, 2010 05:04 PM

January 27, 2010

Connected Action

Node and Venn: NodeXL can create Venn Diagrams!

NodeXL updated starting with version 1.05 with features that make it fairly easy to create basic “Venn Diagrams”.  A Venn diagram is a familiar way to illustrate the overlap (or lack thereof) of two or more “sets” of things.

There are some very amusing Venn diagrams out there!  This one in particular made me laugh but I may be dating myself.

The Venn diagram feature is a special request from the Microsoft Biological Foundation group.

A Venn is related to but different from an Euler diagram.  An “n-Venn” diagram is a collection of closed curves (“circles”) on a plane where all the circles intersect. A “simple” Venn diagram has just two circles but complex diagrams can have more.  A 2 circle Venn diagram has 3 regions (A, B, A+B) and a 3 circle Venn diagram has 7 regions (A, B, C, AB, AC, BC, ABC).

A Survey of Venn Diagrams can be found at http://www.combinatorics.org/Surveys/ds5/VennEJC.html.

Our implementation is a bit of a hack, we basically let you define the X/Y location of 3 circles.  A richer Venn tool would make it easy to take set data and define these circles.  We may get that implemented in the coming months.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at January 27, 2010 07:00 PM

January 25, 2010

Mike Love - influence and visualization

bio_animation



The biomedical animations from the Walter + Eliza Hall Institute are available at Youtube.  Except for some unnatural electronic sound effects and glowing auras, they seem more detailed, less cartoonish than other videos I’ve seen.

by mikelove at January 25, 2010 11:40 PM

John Breslin's Cloudlands on social software

Book launch for “The Social Semantic Web”

We had the official book launch of “The Social Semantic Web” last month in the President’s Drawing Room at NUI Galway. The book was officially launched by Dr. James J. Browne, President of NUI Galway. The book was authored by myself, Dr. Alexandre Passant and Prof. Stefan Decker from the Digital Enterprise Research Institute at NUI Galway (sponsored by SFI). Here is a short blurb:

Web 2.0, a platform where people are connecting through their shared objects of interest, is encountering boundaries in the areas of information integration, portability, search, and demanding tasks like querying. The Semantic Web is an ideal platform for interlinking and performing operations on the diverse data available from Web 2.0, and has produced a variety of approaches to overcome limitations with Web 2.0. In this book, Breslin et al. describe some of the applications of Semantic Web technologies to Web 2.0. The book is intended for professionals, researchers, graduates, practitioners and developers.



Some photographs from the launch event are below.

Reblog this post [with Zemanta]

by Cloud at January 25, 2010 10:53 AM

Data Mining: Text Mining, Visualization and Social Media

New Google UI Fit and Finish

As I tweeted today, I noticed something wrong with the Google landing page I was getting: the advanced search link mistakenly links back to www.google.com, not the advanced search page. Thinking it through, I thought this might be due to flighting a new UI.

Newgoogle1

This UI is similar to that which Tom Krazil spotted back in November, but different in colouring. Going via an anonymous proxy, I get the normal fit and finish.

Update: note also that the web options (the left rail facets) are now their by default, whereas in the standard UI, they are collapsed and require expansion by the user.

In addition, the actual landing page has a different design for its buttons and the shadow on the main Google logo has all but gone.

by Matthew Hurst at January 25, 2010 04:49 AM

January 24, 2010

Data Mining: Text Mining, Visualization and Social Media

Infostate of Africa

AppAfrica has an interesting post showcasing an infographic describing the state of the internet in Africa: Infostate of Africa. I took the liberty of uploading this to SeaDragon which allows for better browsing and appreciation of this type of resource. Note that SeaDragon doesn't get the centering of the image correct, so hit the home icon to see the full thing.

[HT Nathan]

by Matthew Hurst at January 24, 2010 06:26 PM

January 23, 2010

Data Mining: Text Mining, Visualization and Social Media

West Seattle Blog going Mobile

I just noticed that the West Seattle Blog- a well established hyperlocal blog in Seattle - has either rolled out, or is testing a new mobile theme.

Westseattlemobile 



At the bottom of the page is a button to switch the mobile setting off, but it doesn't appear to have any effect, so I suspect this what we are seeing is the testing of a new mobile site for the blog and that they are experiencing a few glitches. Strangely, when I visit this from my iPhone, I get the non-mobile version of the site.

See my related post on 2010 - The Year of the Neighborhood.

by Matthew Hurst at January 23, 2010 06:29 PM

January 21, 2010

Data Mining: Text Mining, Visualization and Social Media

2010 - The Year of the Neighborhood

As the geosocial revolution continues - creating more and more intimate links between the digital space and our physical spaces via mobile devices and data driven services - the word 'neighborhood' is becoming more and more prominent. A neighborhood (in urban terms, larger than a block, smaller than a zipcode) is the perfect granularity to connect with users as we spend a good chunk of our time there.

'Near by' is often scoped by neighborhood, our schools define catchment areas at this level, supermarkets serve neighborhood sized portions of the population.

As we see the rise of geosocial gaming (things like Foursquare, Gowalla, MyTown), and the mechanisms they introduce being adopted by other spatially aware services (Yelp) we are also seeing the rise of the importance of real estate data. It is in no way surprising that Google is interested in the real estate market.

NabeWise, a new neighborhood review site, similar in some regards to both EveryBlock and Centerd. It's just opened its doors with coverage for New York and San Francisco, and its entry points are qualities of neighborhoods (trendy, singles, beautiful people, etc.). It is very interesting to note that the sign up process includes the question 'are you a real estate agent?'

Design-wise, many of these sites have to address the presentation of rich data in an understandable and consumable manner. In this regard, these resident oriented sites have similarities with real estats 2.0 sites (RedFin, Zillow, Trulia) and take advantage of the increased data literacy of a younger, web 2.0 savvy audience.

Key in all of this is the bedrock data set of neighborhoods. As the LA times demonstrated with their Mapping LA project, the definition of any neighborhood is somewhat subjective and borders need to be negotiated. For many cities, Wikipedia keeps rich pages describing neighborhoods and their histories.

From a UX point of view, we can expect to see more interfaces with elements like these, sampled from some of the companies above (NabeWise, Trulia, EveryBlock):

Nabewise

Trulia

Everyblock



by Matthew Hurst at January 21, 2010 02:21 PM

Augmented Social Cognition

What are big research problems in Social Web technologies?

Just finished reading Dion Hichcliffe's piece over at ZDNet on emerging technologies for Social Web in 2010. I have been reading all these different predictions to see how it relates to our research agenda. Dion's piece is long, but several points resonated with what we have been doing:



First, he said that one problem we have is
"Poor integration between social media and location services. Again, while there’s already some location awareness in social networking services today, there’s a long way to go before it’s integrated meaningfully into the social experience to provide real utility."
I agree wholeheartedly. Not too long ago, I participated in a research project here at PARC called Magitti, which was an activity recommender that modeled your content interests, your schedule, your location, as well as the your personal history on the mobile device [1]. The integration of personalization and social features with location-aware services will be a significant trend in 2010, and there will be a lot of good research and products in this area.



Second, he said that people are having difficulties in
"coherently engaging in social activity across many channels. Tired of the day-long round-robin between your e-mail, SMS, Twitter, Facebook, and any other services you use to keep up with what’s going on? You’re not the only one. While aggregation services such as Friendfeed potentially cut down on the manual effort of using the social Web, it’s still not mainstream despite being a good example of what’s possible. Notably it’s often the big (and closed) social silos that are causing the problem."
Our group was an early adopter of FriendFeed, and realized that many of the issues relating to social annotation, commenting, and other interactions were due to the distributed nature of social media. It is hard to keep track of who said what, and the aggregate reactions to content. Our research group has some investments in this research problem, which relates to aggregation and the ability to browse and filter the feeds. We are about to publish a paper in CHI2010 about how to use faceted browsing techniques to partially solve this problem [2].



Finally, the most important point he made was the our need in
"Coping with and getting value from the expanding information volume of social media. We’re all learning how to deal with the firehose of information that flows out of social media on a minute-by-minute basis. Sometimes it’s hard to remember that this flow of transparent and open information is actually good and often useful and creates important conversations. But the simple fact is that much of it isn’t meant for non-stop, instantaneous consumption [emphasis added]; it simply isn’t practical. Rather, social media leaves behind artifacts and information that we can find and use later when we need them. But at the moment the process of sorting through, aggregating, and filtering the vast volume of information cascading through social media today remains a real and growing challenge. I also began to get the first real reports that this is happening in the enterprise last year as social media begins to grow there as well."


Here ASC group's investment in summarization, recommendation, and personalization, etc, hopefully will pay off. Our investments have been in understanding particularly how to apply these techniques in social media, with the added social contexts and new data mining techniques around social streams. Research-wise, we will be pushing on this last point the most, and I believe it is also the area we most likely can extract user value. We are about to publish a paper at CHI2010 on how to do recommendations on Twitter network [3].



I will blog about these research efforts soon.



----

[1] Victoria Bellotti, James Bo Begole, Ed H. Chi, Nicolas Ducheneaut, Ji Fang, Ellen Isaacs, Tracy King, Mark Newman, Kurt Partridge, Bob Price, Paul Rasmussen, Michael Roberts, Diane J. Schiano, Alan Walendowski. Activity-Based Serendipitous Recommendations with the Magitti Mobile Leisure Guide. In Proceedings of the ACM Conference on Human-factors in Computing Systems (CHI2008), pp. 1157-1166. ACM Press, 2008. Florence, Italy.



[2] Hong, L.; Convertino, G.; Suh, B.; Chi, E. H.; Kairam, S. FeedWinnower: layering structures over collections of information streams. Submitted and accepted to ACM CHI2010.



[3] Chen, J., Nairn, R., Nelson, L., Chi, E. H. Short and Tweet: Experiments on Recommending Content from Information Streams. Submitted and Accepted to ACM CHI2010.



by Ed H. Chi (noreply@blogger.com) at January 21, 2010 01:09 AM

January 20, 2010

Joseph Reagle on Wikipedia

Coleman on Hacker Cons

This week I've been reading the reports from camp KDE 2010 and looking forward to attending a few hours of Wikipedia Day NYC. So it was a great pleasure to read Biella Coleman's "The Hacker Conference: a Ritual Condensation and Celebration of the Lifeworld". I haven't seen anyone else address this issue, but as a sometimes participant and scholar of related communities, I think she is right to highlight the importance of this venue. In my forthcoming book I note that in addition to virtual spaces "there are the physical spaces in which some community members interact."

Through Wikipedia "meetups" I've attended in New York and annual Wikimania conferences I've met a couple dozen contributors. Many of these people I've spoken to more than once, and it's quite easy to speak to a newly met Wikipedian about issues of concern to the community. These conversations were informative, but casual.

So, while formal face-to-face interviews played a very small part in my work, the opportunity to meet with people, to participate in conversations, to see playfulness and laugh at jokes was essential to interpreting what I saw happening online. In Biella's work I particularly appreciated the inclusion of some history (though I wanted more detail, including whether fandom conferences might've had any influence), and how Debian women in part rose out of the opportunity of face-to-face interaction.

Coincidentally, in the last year I have been particularly interested in questions of gender representation and participation at geek conferences. There were a number of instances in which the "playful" discourse of men were said to be predicated on sexist assumptions, and at the least had an alienating effect (e.g., Stallman, Aimonetti, Mouette ). In fact, in a conversation with Biella this summer I noted that 2009 was probably the "Year of [Something]", where "something" connotes a greater gender consciousness or willingness to confront alienating discourse in open content communities -- but I couldn't come up with a good word!

January 20, 2010 10:28 PM

Data Mining: Text Mining, Visualization and Social Media

Bing Maps Updated with More Coverage, Apps

Yesterday, Microsoft rolled out an update to our mapping product Bing Maps. The update includes more street imagery coverage, including Vancouver and Whistler:

Coverage

Two new applications, Events and Destination Maps

Events Destinationmaps

Destination Maps allows you to create a fun rendering of the map data for use in invitations, event planning, etc.

Destination

With this release, the product is no longer in beta which means that more users will start seeing at by default. This heats up the competition in the mapping space and distinguished Bing by virtue of its platform approach to mapping (ReadWriteWeb and others offer commentary).

  

  

 

 

by Matthew Hurst at January 20, 2010 04:22 PM

January 19, 2010

Augmented Social Cognition

A Study on Efficient Diffusion of News in an Organization

[joint work between Les Nelson, Rowan Nairn, Ed H. Chi]



In our knowledge economy, enterprises’ competitiveness often depend on the efficiency in which important news travels to the right people at the right times. Knowledge workers depend now heavily on communication channels both inside and outside the enterprise to be kept up to date on the most important information, such as the latest news on competitors, memos on human resources, status of business proposals, and the progress of workflows. The efficiency of news spread in an organization determines not just how the organization might absorb and make sense of the information, but also how it might decide to respond and react.



For example, one study of how email impacts an organization showed that one piece of email may create an organizational footprint that is 30 times larger [1]. A large body of literature surrounds the issue of news flow in organizations, including information seeking, organizational memory, and expertise location. For example, more specific to organizational information flow, sociological research shows that there is greater homogeneity of information within groups of people than between groups of people [2].



News in general is about the communication of current events, where the timeliness of the information is key. ‘Timeliness’ might not necessarily be limited to just up-to-the-minute, ‘breaking’ news. For example, one interviewee in one of our studies recently said: “It's about the leading edge of something. Staying current in a professional sense, I go through bouts of finding information. And I share it”. In the organization, this may constitute keeping up with information for ‘knowing what’ is happening and ‘knowing how’ to do things.



How can organizations better respond to the complex social and technical situation involved in staying current in their areas of business? With respect to news at work, what roles, tools, and practices might we expect in the brokering of news?



INTERVIEW STUDY



We recently conducted an interview study within our research organization. The company is an established research organization, having approximately 200 staff members in one location. Most employees belong to an approximately 5- to 10-person group (we will call this a ‘team’) organized into 4 larger multi-team groups. Each employee has an office, generally located near the rest of his or her group.



The company uses wikis for project and group knowledge repositories. The project wikis typically receive brief but intense activity (e,g, collecting web links on a topic), and then lapse into occasional use. Group wikis are updated infrequently, usually when there is organizational change (e.g., new projects and people). External blogs on topic areas promoted by the company are encouraged. Internal blogs receive infrequent use for general information sharing on topics of wide interest. Microblogging (e.g., Yammer.com) was tried early, but did not persist.



Participants were chosen from a range of positions and tenure with the company, including staff members involved in the primary business production, service people in support of the staff members (e.g., marketing, administrators, staff services), managers, and executive level managers.



16 interviews were conducted in peoples’ offices, starting with a critical incident style interview on the most recent news events received, and then followed by explicit probing to elicit different ways in which news arrives, frequency of such news, and who was involved.



STUDY FINDINGS



We have found a relatively mature practice of relying on the communication channels most commonly used at work, such as email and face-to-face. News not only travel along social networks in the organization, but also there is a strong effort in passing along news that known to be relevant. People are conservative in their choices. Moreover, people tune their social network to ensure they receive the appropriate news.



We find three major ways the company responds to getting receiving and transmitting news:



(1) Email is indeed the channel and medium of choice for news [3];



The figure below shows the frequency in which various ways of passing news back and forth are mentioned in the interview study. Although we find that news arrives and is diffused by many channels, with different levels of timeliness and audience, the primary means of communication is email (either directly or via company mailing lists) and face-to-face conversations in offices, hallways, and at lunch.







(2) News follows peoples’ social/work networks, and there is a strong effort to pass along only news seen as relevant to others;



People filter news streams for their peers as a part of their ongoing conversations at work. The filtering includes quality assessments, time investment appropriate for relaying the news, uniqueness of the news:



One subject said to us,
"I have to read it [news related email] to find out if it is unique enough. I do try to filter if it is worth forwarding. There is a huge quality assessment thing, because I would hate clogging peoples’ streams. I would probably send it to people who are actually engaged in a conversation of this type."




(3) People structure their news networks to get news conveyed in short paths of only the ‘necessary, but sufficient’ recipients. They do this by structuring the channel so that it produces quality news, finding ways to avoid unnecessary communication, or setting up shortest paths.



For example, one subject said on who to follow in Twitter:
"I went through [lots of] phases. Imagine a spiral. I could overhear conversations and pick up derivative connections. Then it got to be a little overwhelming so I went and winnowed those down... and again. The people you follow dictate the information you get. And there were three factors. One is how informative or interesting they were to my interests. The second one was how frequently they updated. If they updated 50 times a day I couldn’t keep up with that. And the third reason is strategically, who I want to build a relationship with".






DESIGN IMPLICATIONS



We take from our findings above the following requirements for systems aimed at work news propagation:



1. Integrate into the email habitat to maximize chances of adoption;



2. Facilitate also putting news receivers in control. While email has its advantages, it is in some sense a sender-controlled system;



3. Allow targeting to continue but increase the chance of serendipitous but relevant connections in a way that keeps the social paths for news short and efficient;



4. Enhance the ability to target news to others without overloading email further;



5. Allow the emergence of shared interest spaces.





References



[1] IDC white paper "The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011", 2008.



[2] Burt, R. S. 2004. Structural holes and good ideas. American Journal of Sociology 110, 2 (September), 349–99.



[3] Ducheneaut, N. ; Bellotti, V. Email as habitat: An exploration of embedded personal information management. ACM Interactions. 2001 September-October; 8 (5): 30-38.



by Ed H. Chi (noreply@blogger.com) at January 19, 2010 10:12 PM

Complexity and Social Networks Blog

The Obama network in 2010

Last year I posed the question about what would come of the Obama network. Would it become a force for policy and political change, for example? This is the great undercovered political story of the last year. Techpresident-- a wonderful blog on technology and politics-- just issued an fascinating report on Organizing for America, the institutionalization of the organization/network/technology side of the Obama campaign. The story it paints is a mixed one-- successful at keeping a nontrivial number of people engaged in an ongoing fashion (although this number is surely tiny relative to the number mobilized in the 2008 election); unsuccessful at mobilizing people toward policy goals (i.e., around health care reform).

Tomorrow, the special election for Senate in Massachusetts offers an interesting test of OFA. Clearly, OFA is all in-- with their e-mails, the use of volunteer phonebanks, etc. Massachusetts must have among the highest density of Obama contributors and volunteers in the country, many battle hardened (from work in NH in 08, not MA, of course).

But the Internet tide has shifted. Republican Scott Brown is the beneficiary of vastly more Internet-based support in the special election than Democrat Martha Coakley, and the question tomorrow (and beyond) is: to what extent did the 2008 election reflect the marriage of the medium and a man with a particular moment? Does the Internet just enable the bottom up mobilization of the passions of the moment, or does it also enable the institutionalization of mobilization?

We won't get the conclusive answer tomorrow, but the tea (party?) leaf I will be reading most closely is the extent of the mobilization of Obama supporters tomorrow. Obama still has reasonably strong support in Massachusetts-- do they turn out? And the less visible story will be-- how many volunteers did OFA mobilize to make calls, to contribute to Coakley, and so on. (On this last point, at least we can look at FEC filings in a few months to see the overlap between 2008 Obama contributors and Coakley contributors January 10 to 18.)

January 19, 2010 12:40 AM

January 18, 2010

Complexity and Social Networks Blog

Sunbelt XXX: final call

A reminder to readers of this blog that Sunbelt, the meeting of the International Network for Social Network Analysis (INSNA) will take place in early July in Trento Italy. The deadline for abstract submissions is tomorrow (January 19). This is a terrific meeting, perhaps my favorite, in a wonderful location.

Trento-Italy.jpg

(credits for photo)

January 18, 2010 11:23 PM