Planet Social Media Research

crowd

July 30, 2010

Data Mining: Text Mining, Visualization and Social Media

Tech Travels Japan

Just back from a quick trip to Japan, I thought I'd write up some thoughts and observations about the trip from a technology point of view.

All told, Japan is a far more technologically integrated country than any other I've visited. Much of this integration is borne from necessity (population density) and through organic processes (what originated as an electronic money system for the railways has morphed into a general e-wallet accepted at many points of sale).

Transport: technology in transport includes automated ticket machines, turnstiles which recognize both of the major electronic wallet formats (which are also embedded in mobile devices), extremely precise timetable execution, conductors carrying wireless, touchscreen ticket verification systems which are integrated with the carriages themselves (when they've verified your ticket, the light above your seat indicates the verification).

In addition to the Japanese side of the trip, traveling on Canada Air was a pretty up to date experience. The 777 had USB and socket outlets on each seat. The touchscreen entertainment system was available for use at the gate (I was already 30 mins into a movie before we took off). On the downside, the video watching experience appears to have morphed into an advertisement pushing channel from which you had to literally look away to avoid given that the screen was only inches away from your face.

Entertainment: all of the consumer electronic stores I visited in both Akihabara and elsewhere were full of 3D TV offerings from all the major flat screen manufacturers. The Sony Building in Ginza (which I understand is to be closed down in the near future) was transformed into a 3D aquarium with all of the floors featuring 3D technology and amazing videos of coral reefs, sharks, etc. Some TVs are boasting the ability to recognize the emotions of the human face, but I couldn't quite figure out what they were doing with the results!

Mobile Devices: I live in the Seattle area which is probably a very biased sample of the US in terms of mobile device use. On the bus I take to commute, iPhone adoption is extremely high, as is Kindle and iPad use. In Japan, with a quite different sample of observations on public transport, iPhones were far less prevalent - passengers tending to use the type of device with a physical keyboard. In addition, I only spotted one iPad (and that in the lobby of a hotel) and no other type of reading device. The Japanese have maintained the original form factor of the pocket paperback book (i.e. a paperback book you can actually fit in your pocket) - and that was still clearly popular. Public telephone kiosks seem to be disappearing (though nothing like to the extent in the US).

Search Engines: no-one has heard of Bing, or the fact that Microsoft has a search engine. It was big news when the Yahoo! Google partnership was announced while I was there (Yahoo! Japan is not the same company as Yahoo!). Google is running billboard advertisements for its browser (Chrome).

Tech Corporations: Two well known corporations in Japan (Rakuten and Uniqlo) have or are switching to English as their official corporate language. This is a pretty interesting change and highlights their international ambitions.

While technology is a big part of Japanese culture, much of it is used to support something that can't be packaged and automated - high quality customer service. The dedication and attention to detail one gets as a consumer or traveler in Japan is incredible and often not related to the amount one is paying.

by Matthew Hurst at July 30, 2010 02:07 AM

July 29, 2010

Connected Action

July 26-30 – Catalyst 2010 conference – Social Networks in the Enterprise

I will be speaking at the 2010 Catalyst Conference in San Diego on July 29th. The conference hashtag is #CAT10.

The slides are here:

A few days before the conference started the #CAT10 twitter social network map looked like this:

2010 - July - 26 - NodeXL - Twitter - #CAT10

26 July 2010 NodeXL Twitter map of the connections among people who tweet “#CAT10″ the hashtag for this year’s Catalyst conference.

2010 - July - 26 - NodeXL - Twitter - #CAT10 top between

This is the list of the most “between” contributors in the #CAT10 Twitter graph on July 26, 2010.

A few days later, as people began to arrive at the conference, the graph became far more dense and populous.

2010 - July - 29 - NodeXL - Twitter - #CAT10

The network of #CAT10 mentioning users in Twitter has become much more dense, with more people and more connections among them as people reply, retweet, follow, and mention one another.

2010 - July - 29 - NodeXL - Twitter - #CAT10 - top between list

While the core people in this list are similar to the list generated a few days earlier, several people have shifted position.

Filtering the graph, we can remove all but the most between people to reveal the core members of the community.

2010 - July - 29 - NodeXL - Twitter - #CAT10 - top between only

These people are likely to play an influential role in the #CAT10 community.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at July 29, 2010 07:15 PM

July 28, 2010

Data Mining: Text Mining, Visualization and Social Media

Amazing Physics Simulations from Lagoa

Vu Nguyen posts this amazing video from Lagoa.

Lagoa Multiphysics 1.0 - Teaser from Thiago Costa on Vimeo.

The company behind this video is somewhat ellusive (a website with their name currently just hosts this video).

by Matthew Hurst at July 28, 2010 01:08 AM

July 27, 2010

Data Mining: Text Mining, Visualization and Social Media

Crowd Sourcing Butterfly Conservation

The BBC writes about an effort in the UK to use crowd sourcing to populate data recording the number of different types of butterflies: the Big Butterfly Count. Participants are asked to spend 15 minutes spotting butterflies and moths. The data, currently 5121 sightings (24 hours later, 5866), is displayed on a map.

Butterflies

A couple of thoughts. Firstly, I think the data could be displayed in a far more engaging manner with a heat map of some sort, with the ability to show clusters of different species at least. The following is an inefficient way to show the data for a species:

Butterfly2 



Secondly, I wonder if Twitter could be used in some way to channel the data - one could even tweet a picture to the project. That way, the data could be verified and it would come with geolocation and time associated.

Finally, and perhaps most importantly, the site suffers from the age old problem of inadvertent-tab-ellipsis-renaming:

by Matthew Hurst at July 27, 2010 12:26 PM

July 26, 2010

Data Mining: Text Mining, Visualization and Social Media

Augmented Reality - 17 years from Concept to Product?

Almost twenty years ago, I recall coming across a paper (either in the AI library in Edinburgh, or the Computer Science library in Cambridge) which described an augmented reality approach to that most intractable of problems: fixing printers.

A number of forces have conspired to allow me to access a reference to that paper (Google's crawl/search, my memory being prompted repeatedly by augmented reality applications on mobile devices).

At any rate, I suspect the image below, from a document with a 1993 time stamp, may be one of the earliest incarnation of augmented reality. Feiner, S., MacIntyre, B., and Seligmann, D. (1993) "Knowledge-Based Augmented Reality." Communications of the ACM, Vol. 36(7), pp. 53-62.

Billinghurst2 

Looking around now, what will be hitting mainstream in 17 years?

by Matthew Hurst at July 26, 2010 04:19 AM

July 25, 2010

UMBC Ebiquity

New Yorker on voting systems and fair elections

votingThis week’s New Yorker magazine has an article by Anthony Gottlieb on different voting systems, including range voting.

WIN OR LOSE: No voting system is flawless. But some are less democratic than others. Can theorists engineer a better way to elect candidates?

The article provides an interesting introduction to some of the voting systems that have been developed and used over the centuries and the advantages and vulnerabilities. There’s no mention of Scantegrity or security or the general issue of verifiability, however.

It’s actually in the Book’s section, so I guess it is ostensibly a review of a new book “Numbers Rule: The Vexing Mathematics of Democracy, from Plato to the Present” by journalist and mathematician George Szpiro.

The article also mentions a book by William Poundstone, “Gaming the Vote: Why Elections Aren’t Fair (and What We Can Do About It)” which is a steal on amazon for $5.00. Such a steal that I ordered two last week, one for me and one to share. Poundstone, btw, has written some good popular books on a wide range of topics (e.g., game theory, technical interviewing techniques, etc). I’ve read quite a few and both enjoyed them and learned things. According to Wikipedia, he is a cousin of comedian Paula Poundstone!

by Tim Finin at July 25, 2010 05:11 PM

Data Mining: Text Mining, Visualization and Social Media

You Think You've Got Problems?

I'm re-reading Mosteller's Fifty Challenging Problems in Probability With Solutions while traveling in Japan. This is absolutely one of my favourite books (not least because it is such a tiny volume). Fredrick Mosteller was "one of the most eminent statisticians of the 20th century" (according to Wikipedia), and a statistician who was passionate about education. The book actually has 56 problems, generally simply stated, like the following:

A drawer contains red socks and black socks. When two socks are drawn at random, the probability that they are red is 1/2. (a) How small can the number of socks in the drawer be? (b) How small if the number of black socks is even?

Reading the problems in this book lead me to thinking about what constitutes a 'good' puzzle. Personally, I like those problems that have one or more of the following qualities:

  • Counter-intuitive: the birthday problem is like this, 23 sounds like a low number.
  • Relies on basic number theory: Mosteller's problems require things like working with geometric series, binomial coefficents, etc.
  • Motivates you to think probablistically about things that you don't generally consider in that way: for me, those problems involving randomly throwing sticks on a table are fun (but I do remember Johnny Ball performing this experiment live once on the BBC...)
  • Problems that seem recursive but can be solved simply.
  • Entertainingly cast as a brief story (e.g., a three person dual).

Intruigingly, Mosteller's solutions, while nicely written with an engaging informality, still require concentration, especially when they skip a few steps on route to the prize. Generally, he demonstrates a great way to attack the problems by first playing with a few examples, and then running through one or more full solutions (often employing induction and negation tactics).

If you are looking for other good sources of puzzles, Car Talk (on NPR, Saturday at 9) offers not just amusing car problems, but also the occasional Mosteller like puzzler...

by Matthew Hurst at July 25, 2010 12:30 PM

July 24, 2010

Data Mining: Text Mining, Visualization and Social Media

Task Oriented Search (Bing), Task Oriented Browsing (Firefox) - Let's Dance!

I've just watched the video introducing Firefox's TabCandy:

An Introduction to Firefox's Tab Candy from Aza Raskin on Vimeo.

Much of what is presented is not novel - but novelty is not the heart of innovation. Execution is, and TabCandy looks like it has a great chance of taking off - the Firefox userbase and the simplicity of the user experience.

Bing's whole approach to differentiation has been to focus on tasks, and a big part of TabCandy is to help the user scale over tasks, especially those with a long term nature. Should the browser embrace the information sources, or should the search engine embrace browser like state?

by Matthew Hurst at July 24, 2010 01:27 AM

July 20, 2010

Data Mining: Text Mining, Visualization and Social Media

Facebook Using Tagged Photos for Authentication

Logging on to Facebook from Japan triggered a lengthy authentication process. Having only used the site from US IPs logging in from a Japanese address forced me to go through an authentication process which involved answering a number of face identification questions in a multiple choice format. This was quite un-nerving for the following reasons:

  1. Some of the people I knew from face to face interactions, but I hadn't seen in a long time (people change over time).
  2. Some of the people I had never met and was barely able to figure out who they were. If it weren't for the multiple choice format of the challenge I would have failed.
  3. Some of the photos - this is true - had tagged people who were wearing fancy dress costumes including a full mask. The face was literally not visible, and I wouldn't be getting any points for recognizing a dead president. Fortunately Facebook showed two different images per person.

Facebook gets full points for an innovative application of tagged pictures, but it wouldn't have surprised me at all if I had failed to 'reactivate' my account (following the language used, logging in from a different country must 'deactivate' your account).

The irony for me was that I had actually meant to log on to Foursquare but my fingers weren't listening to my brain. I was actually already in the authentication process before I realised - this doesn't look like Foursquare...

by Matthew Hurst at July 20, 2010 10:54 AM

Data Mining: Text Mining, Visualization and Social Media

Checking In to Japan on Foursquare

I've just done my first Japan checkin on Foursquare - it was non trivial. Firstly, I had to add a location via Foursquare's website. This was a real challenge as the site is not really aligned with the way in which Japanese addresses are expressed. After a lot of hacking (throughout which Foursquare kept telling me it couldn't find the location) I figured out that if I could tickle the Google location search via the Foursquare interaction, I could get the right place to show up. Finally, this succeeded and I added 京急富岡駅.

This would have been a lot easier if I could have directly located the venue on the map (as a lat/lon position) and then simply named it. That mode of entry seems like a pretty general backup plan for any problems, and easy to implement, so I'm assuming it is not available due to some potential for abuse.

Then I wanted to check in. This is can be done via the Foursquare mobile web page (first you need to update your location).



by Matthew Hurst at July 20, 2010 12:30 AM

July 19, 2010

Complexity and Social Networks Blog

Study: eParticipation and Web 2.0 in German state and local government - Still in beta phase

After years of writings on the potential of using the Internet to improve democratic governance, one must think that citizens have numerous offerings to choose from. Lately governments have introduced numerous policy papers and declarations that put a priority on citizen government interaction through information, consultation or collaborations. Unfortunately, the UN eGovernment benchmark and other writings draw a rather pressimistic picture of the current state of eParticipation in Western democracies.

Because of the increasing public discourse on Government 2.0 and Germany's astounding improvement by 46 ranks from 2008 to 14th place in the area of eParticipation in this year's UN eGovernment benchmark, we tried to take a closer look at the current state of eParticipation and Web 2.0 in German state and local government (follow the link to download the study).

Following a web-based data collection, eParticipation offerings and use of Web 2.0 applications on the web portals of Germany's 50 largest cities and 16 federal states in the areas of urban planning, budgetary planning, complaints/suggestions and citizen services within a four-step policy cycle were analyzed.

The results underline that informational integration of citizens in government outweighs consultative approaches.This study illustrates that while states and municipalities have eParticipation on their agenda, they lack the willingness or resources to fully engage in it. For the cases studied, German Government 2.0 activities seem to be in beta phase. It is, therefore, important to focus on three areas. First, improve knowledge on the potential, limits and implementation of eParticipation and Web 2.0 applications in politics and government. Second, convince government officials to just try out new things and sail into uncharted waters. Third, give citizens the opportunity to learn participation in various ways as early as possible. Most certainly, all of these recommendations hold true for other countries as well.

July 19, 2010 01:52 PM

July 17, 2010

Data Mining: Text Mining, Visualization and Social Media

Visualizing Location and Mood in Twitter

Last year I wrote about some work done by Sune Lehmann and colleagues at the Barabasi Labs which explored the relationship between location and affect signals in Twitter (here). Sune pointed me to a recent update which extends the work to a cartogram approach to visualizing moods and volume: Mood, Twitter and the new shape of America.

The work explores both visualization methods and the data itself. It also demands answers to some interesting questions, the least of which is the apparent difference between the coasts and the bits in the middle. I'd be interested in seeing an analysis of these distinctions,

by Matthew Hurst at July 17, 2010 08:58 PM

July 16, 2010

UMBC Ebiquity

Google acquires Metaweb and Freebase

Google announced today that it has acquired Metaweb, the company behind Freebase — a free, semantic database of “over 12 million people, places, and things in the world.” This is from their announcement on the Official Google blog:

“Over time we’ve improved search by deepening our understanding of queries and web pages. The web isn’t merely words — it’s information about things in the real world, and understanding the relationships between real-world entities can help us deliver relevant information more quickly. … With efforts like rich snippets and the search answers feature, we’re just beginning to apply our understanding of the web to make search better. Type [barack obama birthday] in the search box and see the answer right at the top of the page. Or search for [events in San Jose] and see a list of specific events and dates. We can offer this kind of experience because we understand facts about real people and real events out in the world. But what about [colleges on the west coast with tuition under $30,000] or [actors over 40 who have won at least one oscar]? These are hard questions, and we’ve acquired Metaweb because we believe working together we’ll be able to provide better answers.”

In their announcement, Google promises to continue to maintain Freebase “as a free and open database for the world” and invites other web companies use and contribute to it.

Freebase is a system very much in the linked open data spirit, even thought RDF is not its native representation. It’s content is available as RDF and there are many links that bind it to the LOD cloud. Moreover, Freebase has a very good wiki-like interface allowing people to upload, extend and edit both its schema and data.

Here’s a video on the concepts behind Metaweb which are, of course, also those underlying the Semantic Web. What the difference — I’d say a combination of representational details and centralized (Metaweb) vs. distributed (Semantic Web).

by Tim Finin at July 16, 2010 07:30 PM

UMBC Ebiquity

Search neutrality: Google and Danny Sullivan weigh in

Web search guru Danny Sullivan has a great response to the NYT editorial on regulating search engine algorithms: The New York Times Algorithm and Why It Needs Government Regulation. Here’s how it starts:

“The New York Times is the number one newspaper web site. Analysts reckon it ranks first in reach among US opinion leaders. When the New York Times editorial staff tweaks its supersecret algorithm behind what to cover and exactly how to cover a story — as it does hundreds of times a day — it can break a business that is pushed down in coverage or not covered at all.”

Google published its own response to the Times piece as a Financial Times op-ed and also posted it to the Google public policy blog: regulating what is “best” in search?

“Search engines use algorithms and equations to produce order and organisation online where manual effort cannot. These algorithms embody rules that decide which information is “best”, and how to measure it. Clearly defining which of any product or service is best is subjective. Yet in our view, the notion of “search neutrality” threatens innovation, competition and, fundamentally,your ability as a user to improve how you find information.”

The penultimate paragraph gives what they say is their strongest argument againt mandating “search neutrality”.

“But the strongest arguments against rules for “neutral search” is that they would make the ranking of results on each search engine similar, creating a strong disincentive for each company to find new, innovative ways to seek out the best answers on an increasingly complex web. What if a better answer for your search, say, on the World Cup or “jaguar” were to appear on the web tomorrow? Also, what if a new technology were to be developed as powerful as PageRank that transforms the way search engines work? Neutrality forcing standardised results removes the potential for innovation and turns search into a commodity.”

This assumes of course, that there is real competition among Internet search engines. Microsoft has been putting a lot of research and development into Bing with good results and it’s been gaining market share. Yahoo is doing very interesting this as well. Consumer choice among a handful of competitors would be the best way to ensure that none abuse their customers.

by Tim Finin at July 16, 2010 05:01 AM

July 15, 2010

UMBC Ebiquity

New York Times editorializes about the Google search ranking algorithm

In what may be a first, today’s New York Times has an editorial about an algorithm. No, they haven’t waded into the P=NP issue, but commented on Google’s algorithm for ranking search results and accusations that Google unfairly biases it for its own self interest.

“In the past few months, Google has come under investigation by antitrust regulators in Europe. Rivals have accused Google of placing the Web sites of affiliates like Google Maps or YouTube at the top of Internet searches and relegating competitors to obscurity down the list. In the United States, Google said it expects antitrust regulators to scrutinize its $700 million purchase of the flight information software firm ITA, with which it plans to enter the online travel search market occupied by Expedia, Orbitz, Bing and others.”

This issue will become more important as the companies dominating Web search (Google, Microsoft and Yahoo) continue to increase their importance and also broaden their acquisition of companies offering web services.

The NYT’s position is moderate, recommending:

Google provides an incredibly valuable service, and the government must be careful not to stifle its ability to innovate. Forcing it to publish the algorithm or the method it uses to evaluate it would allow every Web site to game the rules in order to climb up the rankings — destroying its value as a search engine. Requiring each algorithm tweak to be approved by regulators could drastically slow down its improvements. Forbidding Google to favor its own services — such as when it offers a Google Map to queries about addresses — might reduce the value of its searches. With these caveats in mind, if Google is to continue to be the main map to the information highway, it concerns us all that it leads us fairly to where we want to go.

by Tim Finin at July 15, 2010 06:28 PM

July 14, 2010

Complexity and Social Networks Blog

Mood, twitter, and the new shape of America

Twitter is a gigantic repository for our collective state of mind. Every second, thousands of tweets reveal what everybody and their mother had for lunch, what Justin Bieber is up to, or what magnificent link you should be checking out right now. Individually, each tweet is mostly interesting to friends/fans of the tweeter, but taken together they add up to something more.

In analogy to individual neurons firing together to add up to the human consciousness, the billions of tweets have meaningful macro-states that contain information about the whole system rather than the individual tweeters. But we need to do a little data mining to extract meaningful information about these states, to expose our collective states of mind.

As a proof-of-concept we've1 been studying the mood2 of all of the public tweets. While there are many services that will allow you to study the mood of your own tweets (and also an neat little DIY project to show you the global average of twitter), much less effort has gone into studying how the mood breaks down according to geography. Below, I show a brand new video displaying the pulsating 24-hour twitter mood cycle of the United States (I'll explain just what you're looking at, in the following).

In the video, green corresponds to a happy mood and red corresponds to a grumpier state of mind. The area of each state is scaled according to the number of tweets originating in that state. Note how the East Coast is consistently 3 hours ahead of the West Coast, so when we're sleeping in Boston, the Californians are tweeting away. It's also interesting that better weather seems to make you happier (or rather, that better weather is correlated with happier tweets): Florida and California seems to be consistently in a better mood than the remaining US. Also note how New Mexico and Delaware behave very differently from their neighbors. Full results, individual maps, and a high-res poster can be found on the dedicated Twitter Mood website.

How to construct the mood map

Since many twitter users list their location, we've assigned every tweet in our (massive) database to a US county and extracted their mood. This allows us to average over tweets and plot the mood of the US as a function of geography (and time). However, since the US is unevenly populated, the resulting maps are boring since only a few counties (the centers of cities) contain most of the tweets (not too many tweets in Ellsworth, Nebraska yet).

Luckily, brilliant people have come up with a cool way of solving this problem using a technique called density equalizing maps3 (or cartograms). The idea here is simple: warp the map in such a way that certain features of shape are conserved, but in such a way that the (population) density becomes the same everywhere. The resulting maps look like something from an alternate universe and allow us to show the US mood much more clearly.

Notes

[1] The twittermood project members are Alan Mislove, YY Ahn, JP Onnela, Niels Rosenquist, and undersigned.

[2] For a deeper explanation of how we evaluate the mood of tweets, see the Twitter Mood website.

[3] An easily accessible explanation of the density equalizing maps, is posted on the Twitter Mood website.

July 14, 2010 01:33 AM

July 10, 2010

Data Mining: Text Mining, Visualization and Social Media

No Plan Survives Contact with the Data

Recently, I've been working through a number of planning processes with varying degrees of uncertainty. Some thoughts:

  • There are a number of different areas of uncertainty with any systems plan. Two key areas of uncertainty are engineering, or technical, and data and inference.
  • Uncertainty around engineering involves questions about technical capabilities of platforms and architectures with respect to requirements.
  • Uncertainty around data and inference involves issues to do with the characteristics of the data (does it even include the stuff that I want?), the nature of the inference problem (is it mathematically attainable to the degree required?).
  • Engineering uncertainty can be reduced by the application of technical knowledge and rapid prototyping.
  • Data and inference uncertainty requires exploration of data (assuming it is novel to the team).

In general, one can't take a rationalist approach to planning and, in my opinion, it is better to reduce uncertainty by acting - whether that acting be prototyping, data mining, or whatever. In this regard, the exercise of building an end to end system (where some thought and commitment is given to the gross functional architectural design) is useful.

Barney recommended reading 'Made to Stick' by the brothers Heath. Generally, I'm somewhat wary of bestseller big idea books, but I'm finding this one already engaging.

In the first chapter, there is an account of the military's approach to planning. Colonel Tom Kolditz is quoted:

No plan survives contact with the enemy. [a phrase attributed to Helmuth von Moltke the Elder]

The book then goes on to describe the notion of Commander's Intent - meaning what is important is the intended result. It is imperative that this is crisp and the rest of planning flows from the structure of the organization and the adaptation of actions in the course of the activity up to the culmination of the goal.

Of course, in the fierce world of internet applications and services, one might regard the competition as the enemy. But the analogy works well with respect to planning where the data is the 'enemy', thus:

No plan survives contact with the data.

by Matthew Hurst at July 10, 2010 06:41 PM

July 09, 2010

UMBC Ebiquity

Google Open Spot Android app finds parking

sf_retrieving_spotGoogle’s Open Spot Android app lets people leaving parking spots share the information with others searching for parking nearby. Running the app shows you parking spots within a 1.5km. New parking spots are assumed to be gone after 20 minutes and removed from the system.

People who announce open spots gain karma points, while those who report false spots, known as griefers, are on notice:

“We’re watching for behavior that looks like a griefer spoofing parking spots. We have a couple of mechanisms available to make sure someone can’t leave a bunch of fake parking spots. If we see this happening we will take steps to fix it.

This is a simple example of a context-aware mobile app that can further benefit from also knowing that you are driving, as opposed to riding, in your car and likely to want to find a parking spot, as opposed to doing 70mph on I-95 as it goes through Baltimore. Moreover, context would also inform that app that you are probably leaving a public parking spot and mark it automatically. However, such a feature should be smart enough to avoid being tagged by Google as a griefer and finding out what punishment Google has in store for you.

by Tim Finin at July 09, 2010 11:02 PM

Connected Action

Bernie Hogan’s Facebook Network Map featured in Journal of Social Structure (JOSS) (Made with NodeXL)

The Journal of Social Structure has released its First Annual JoSS Visualization Symposium results and two of the images were generated with NodeXL.  One of the two is Bernie Hogan’s radial layout applied to representing Facebook Friend networks.

http://jossviz.wordpress.com/2010/06/23/friendwheel-layout-of-a-facebook-network/

The Journal of Social Structure (JoSS) is an electronic journal of the International Network for Social Network Analysis (INSNA).  Here is Bernie’s description of the graph.

This is a “pinwheel” diagram using the author’s Facebook personal network (captured July 15, 2009).

Nodes represent the author’s friends and links represent friendships among them. The author is not shown. Each ‘wing’ radiating outwards is a partition using a greedy community detection algorithm (Wakita and Tsurumi, 2007). Wings are manually labelled. Node ordering within each wing is based on degree. Node color and size is also based on degree. Nodes position is based on a polar coordinate system: each node is on an equal angle of n/360º with a radius being a log-scaled measure of betweenness. Higher values are closer to the center indicating a sort of cross-partition ‘gravity’.

This layout has several notable features:

- The angle of each wing is proportionate to its share of the network. Thus 25 percent of nodes go from 0 to 90º.

- Partitions are distinguished by their position rather than a node’s color or shape.

- The tail indicates the periphery of each partition. A wing with many tail nodes indicates many people who are only tied to other group members.

- Edges crossing the center show between-partition connections. Since nodes are sorted by degree it is easy to see if edges originate from the most highly connected nodes or the entire partition.





Bernie’s chapter on analyzing Facebook networks with NodeXL appears in the book: Analyzing Social Media Networks with NodeXL: Insights from a connected world.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at July 09, 2010 02:00 AM

July 08, 2010

Connected Action

July 22st, 2010 SNA event at Stanford: Network Analysis Made Easy: Using NodeXL To Map Social Media Networks

There is a Stanford Media X event on July 22nd, 2010 on new tools for SNA:

Network Analysis Made Easy:  Using NodeXL To Map Social Media Networks

http://mediax.stanford.edu/WSI/marc.html

Bring a laptop (running Windows and Office 2007 or 2010) to this workshop and you can be analyzing a social media network from systems like Twitter, flickr, YouTube and your own email by the end of the day.  If you can make a pie-chart in Excel, using the free and open NodeXL (http://nodexl.codeplex.com) you can now make a rich network graph from data extracted from social media systems and other common formats.  If you have a network, bring it, if not you can bring a suggested topic that we can map during the course of the day.

Even if you leave your laptop behind or have a Mac (sorry, no version is yet available for MacOS – unless you have a virtual machine with Windows and Office) this workshop will introduce the core concepts of network science with application to social networks in general and social media networks in particular. Applied to a range of topics and services, social media network maps can illuminate a variety of “publics” – populations who share a common interest and may share connections.  Maps of topics like “oil spill”, “global warming” and other issue and event related keywords can reveal the groups and factions that cluster around different concepts and terms.  Key contributors in these maps can be identified through the application of network measurements that capture various aspects of a  person’s location in a network graph.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at July 08, 2010 06:13 PM

Connected Action

July 12-13, 2010: Microsoft Research Faculty Summit, Redmond, WA

Faculty Summit

The 2010 Microsoft Research Faculty Summit was held July 12 and 13 in Redmond, Washington.  Among the many panels and discussions related to the state of computer science the NodeXL team had several representatives talking about the ways network science education can be expanded using an easy to use application for network analysis built on Excel.

Jimmy Lin from the University of Maryland also attended to speak about programming in the cloud.

Here is the abstract for the NodeXL talk:

NodeXL – Social Network Analysis in Excel—Natasa Milic Frayling, Microsoft Research; Ben Shneiderman, University of Maryland; Marc Smith, Connected Action

Businesses, entrepreneurs, individuals, and government agencies alike are looking to social network analysis (SNA) tools for insight into trends, connections, and fluctuations in social media. Microsoft’s NodeXL is a free, open-source SNA plug-in for use with Excel. It provides instant graphical representation of relationships of complex networked data. But it goes further than other SNA tools—NodeXL was developed by a multidisciplinary team of experts that bring together information studies, computer science, sociology, human-computer interaction, and over 20 years of visual analytic theory and information visualization into a simple tool anyone can use. This makes NodeXL of interest not only to end-users but also to researchers and students studying visual and network analytics and their application in the real world. NodeXL has the unique feature that it imports networks from Outlook email, Twitter, flickr, YouTube, WWW, and other sources, plus it offers a rich set of metrics, layouts, and clustering algorithms. This talk will describe NodeXL and our efforts to start the Social Media Research Foundation.

Some photos from the event:

Saul Greenberg at the 2010 MSR Faculty Summit

Saul Greenberg

Ben Shneiderman and Andy van Dam 2010 MSR Faculty Summit

Ben Shneiderman and Andy van Dam

Ben Shneiderman, Natasa Milic-Frayling, and Marc Smith at the 2010 MSR Faculty Summit

Ben ShneidermanNatasa Milic-Frayling and Marc Smith

Tom McMail and Marc Smith at 2010 MSR Faculty Summit

Tom McMail and Marc Smith

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at July 08, 2010 05:42 PM

Connected Action

Automatic for the people (who use the latest NodeXL!). Release v.1.0.1.128

The NodeXL team has just released a new version (v.1.0.1.128) that contains a new “Automation” feature that allows users to define a collection of operations to perform on their network graphs and invoke the complete set in a single button click AND reuse that configuration on other workbook graphs.  In fact, the feature will apply the configuration you define to all the files you specify, allowing easy processing of large collections of network data sets.

This week the feature is partially complete.  Users can invoke the merge duplicate edges, calculate graph metrics, auto-fill columns, create sub-graph images, find clusters and show graph.  These operations can require as many as dozens of clicks when performed manually.  If you have dozens or hundreds of network data sets the result is a daunting case of repetitive strain injury and carpal tunnel syndrome.  Instead, with automation, these operations can be carried out orders of magnitude more frequently without much pain!

The next release will feature the complete package which will then include control over the layout and graph options.  As a result, automatically generated network visualizations can be produced in a pipeline: users will be able to specify a query using the NodeXL desktop network data collector and then automate the processing of  large collections of data sets.

The result should be better analysis of time series data sets that have many “slices”.  The feature points the way to additional development work for supporting the comparison between networks to evaluate their evolution.


The REM album “Automatic for the people” takes its title from the motto of Athens, Georgia, eatery Weaver D’s Delicious Fine Foods. Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at July 08, 2010 03:57 PM

Connected Action

Paper: Tech Report at University of Maryland on EventGraphs

A new paper on visualizing social media has been released on the University of Maryland, Human Computer Interaction Laboratory tech report archive.  Co-authored by Derek Hansen,  myself, and Ben Shneiderman, the paper describes and visualizes the patterns of connections formed when people tweet about events like conferences and news stories.

EventGraphs_2010_HCIL_Tech_Report

http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-13

Hansen, D., Smith, M., Shneiderman, B.

EventGraphs: Charting Collections of Conference Connections

HCIL-2010-13

EventGraphs are social media network diagrams constructed from content selected by its association with time-bounded events, such as conferences. Many conferences now communicate a common “hashtag” or keyword to identify messages related to the event. EventGraphs help make sense of the collections of connections that form when people follow, reply or mention one another and a keyword. This paper defines EventGraphs, characterizes different types, and shows how the social media network analysis add-in NodeXL supports their creation and analysis. The paper also identifies the structural and conversational patterns to look for and highlight in EventGraphs and provides design ideas for their improvement.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at July 08, 2010 02:23 PM

Complexity and Social Networks Blog

The arrest of a suspect in the "Grim Sleeper" killings

Four years ago, Frederick Bieber, Charles Brenner and I wrote a paper in Science on the feasibility of "familial searching" of offender DNA databases for leads. Familial searching utilizes the known statistical correlations in the genetic profiles of close relatives to produce investigative leads. I followed this up with a Taubman Center "policy brief" on the ethical and practical implementation issues of a familial search policy. Yesterday familial searching produced a striking breakthrough in LA's notorious "Grim Sleeper" serial murder case: the arrest of Lonnie David Franklin Jr. because the DNA from a piece of pizza he discarded while he was under police surveillance matched the DNA from the Grim Sleeper crime scenes.

The reason why the police had placed Franklin under surveillance? Because DNA from his son, convicted of a qualifying offense, had produced a "familial match" to the crime scene profile. This match, in turn, led to a frantic search through the son's family tree, and ultimately to the surveillance of Franklin.

Without getting too deeply into the broader ethical/policy issues here, the essential policy conundrum is that familial searching is potentially effective, but de facto incorporates millions of individuals who have not even been suspected of any crime into the offender database. Only two states allow familial searching currently (California and Colorado), but the Grim Sleeper case offers a dramatic ethical boundary case in the ongoing policy battle. The non-hypothetical question proponents of familial searching can now offer: can one justify not using familial searching in an investigation of a serial murder case where there is ongoing danger to the public?



July 08, 2010 01:18 PM

July 05, 2010

Data Mining: Text Mining, Visualization and Social Media

FindTheBest.com A Comparison Site for Everything

Jack Middlebrook from FindTheBest.com dropped me a line to tell me about the site, and specifically the matrix it generates for comparing teams playing in the World Cup.

FindTheBest describes itself thus:

an objective comparison search engine that allows you to find a topic, compare your options and decide what's best for you. Ultimately, FindTheBest allows you to make faster and more informed decisions by allowing you to easily compare all the available options.

The site provides a table of entities and attributes, similar to Google Squared.

Findthebest1



The site then allows you to select a number of rows, which it then pivots on to provide a readable comparison set.

Findthebest2

by Matthew Hurst at July 05, 2010 09:58 PM

UMBC Ebiquity

Wikipedia offline due to power outage

Wikipedia was offline for nearly twelve hours today, starting about 11:00am EDT. According to Wikipedia’s Twitter feed:

“Thanks for being patient, everyone. We’ve figured out the problem: power outage in our Florida data center. Slowly coming back online!”

This is not the first time that Wikimedia has experienced problems cause by power outages. In March 2010, Wikipedia was also knocked offline globally:

“Due to an overheating problem in our European data center many of our servers turned off to protect themselves. As this impacted all Wikipedia and other projects access from European users, we were forced to move all user traffic to our Florida cluster, for which we have a standard quick failover procedure in place, that changes our DNS entries. However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally. This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects.”

According to a story in itnews

“The cluster is hosted in a co-location facility in Tampa, Florida, which has approximately 300 servers, a 350 Mbps connection, and supports up to 3,000 hits per second, or 150 million hits per day. Two other server clusters – knams in Amsterdam, Netherlands and yaseo, provided by Yahoo! in Seoul, South Korea – also provide hosting and bandwidth to serve users in various regions.

It looks like there are still failover problems. :-( We can watch the WIkimedia Technical blog for more information.

by Tim Finin at July 05, 2010 03:39 AM

UMBC Ebiquity

How could Semantic Overflow eat its own dog food?

Semantic Overflow is great largely because it benefits from the good design and implementation of the StackExchange framework. Could our site be be improved with Semantic Web technology, i.e., by eating our own dog food?

It’s not just an academic question. Recently the community QA site Training Examples got quite a bit of visibility as a site

“Where data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!”

If you visit the site you will see that it closely follows the Stack Overflow design, complete with tags, reputation, badges, etc. It uses QSQA, which is free software licensed under GPL and implemented in Python using Django. Site creator Joseph Turian has mentioned a a desire to improve the site by applying machine learning and language processing techniques to its content.

So, how could Semantic Web technology be used to improve our own Q&A site? Add your suggestions here.

by Tim Finin at July 05, 2010 03:00 AM

July 03, 2010

Connected Action

Mapping the connections among people who tweet #sunbelt

The International Sunbelt Social Network Conference is the official conference of the International Network for Social Network Analysis (INSNA).

This year’s INSNASunbelt” conference is at the  Riva del Garda Fierecongressi, Trento, Italy!  Here is the 2010 INSNA Sunbelt Program.

This is the NodeXL map of connections among people who tweeted the hashtag used for the conference “#sunbelt”.

2010 - July - NodeXL - sunbelt - 2010-07-01

Having now seen several of these maps for other topics and events (see: http://www.flickr.com/photos/marc_smith/sets/72157622437066929/) this map can be placed in context.  It is a small group, but has a high density of connections.  It lacks isolates, the people who say the term but do not connect to others who say that term.  This means that this is a very “in-group” population: if you know to use the #sunbelt hashtag, you probably connect to someone else who uses the term.  It is a single major cluster of connected people, no obvious sub-graphs or clusters are visible.  Not everyone is central in the graph, and those who are have a prominent role in the network science community.  Here is the top ten list of #sunbelt mentioning twitter users ranked by betweeness centrality.

miriamnotten

barrywellman

memeticbrand

isidromj

drewconway

gephi

kristtina

danevans87

valdiskrebs

ciro

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at July 03, 2010 04:53 PM

July 01, 2010

Complexity and Social Networks Blog

Connecting the Dots: Harvard Symposium on Network Visualization

We are pleased to announce the first ever CONNECTING THE DOTS symposium on network visualization, at Harvard University on Friday, October 22, 2010. The symposium will feature two exciting keynote speakers:

Alessandro Vespignani, Professor of Informatics and Computing, Indiana University, Bloomington

Ben Fry, co-developer of Processing and data visualization expert

In addition to the keynotes, we are soliciting proposals for guest speakers to give short 20-minute presentations. We are interested in any presentation that includes the visual depiction and/or visual analysis of network data as a central theme. Potential topics include but are not limited to network visualization algorithms, network visualization software, network communities and visualization, other network theory or analysis, and artistic projects centering on network visualization. Given the cross-disciplinary nature of network science, we welcome applications from researchers in any scientific discipline.

To register for this free symposium, please RSVP here. Due to space limitations, we are only able to accept registration for the first 80 participants, so be sure to register early to guarantee a spot! Lunch will be provided. To apply to give a talk, please submit an abstract of your presentation in 250 words or less in addition to your personal information, available on the second page of the registration form.

The symposium is organized by Michael Barnett, Jukka-Pekka Onnela, and Samuel Arbesman of the Christakis Lab at Harvard.

July 01, 2010 10:55 PM

June 30, 2010

UMBC Ebiquity

Training Examples QA: stackoverflow for NLP and ML

Training Examples QA is a site created by Joseph Turian where “data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!”

It’s a close knock off of the popular stack overflow site and appears to be very well done.

If it catches on in the relevant research communities, it could be a very useful resource. (via LingPipe blog)



Screen shot 2010-06-30 at 1.10.24 PM

by Tim Finin at June 30, 2010 05:19 PM

Complexity and Social Networks Blog

Pervasive Overlap

Just recently, I came across the following video showing LinkedIn chief scientist DJ Patil explaining the egocentric networks (networks consisting of an individual and their immediate friends) for a few individuals based on their LinkedIn connections.

Although the individuals in the center of these egocentric networks are unusual (in the sense that they have many more LinkedIn connections than the average user), the video clearly shows that each person is a member of multiple communities where the communities are dense and almost fully connected, while there are fewer connections between the communities. (If any of this sounds familiar, it's because I wrote about this subject a couple of months ago here).

This notion of social structure implies that -- seen from the perspective of a single node -- everything is relatively simple: the world breaks neatly into easily recognizable parts (e.g. family,  co-workers, and friends). There are few or no links between the communities because we actively work to keep them separate (more here, on why this is the case).

I've been thinking about the consequences of this local structure for a while, and recently coauthored a paper this subject with YY Ahn and Jim Bagrow [1]. Here, and in an upcoming blog post, I'll be writing about some insights from that work.

The idea I hope to explore here has to do with the global structure that arises when all nodes in a network have multiple community affiliations, when there is pervasive overlap. In the follow up, I'll explore how a single hierarchical organization of the network can exist in the presence of pervasive overlap.

Untangling the hairball

In the standard view of communities in networks, the global structure is modular [2]. This situation is shown below (left), where the communities are labeled using different colors (image from gephi.org). Modular structure on the global level implies, however, that individual nodes can have only a single community affiliation!

If every node is a member of more than one community -- and this is clearly the case in the LinkedIn example, as well as in real social networks -- then the global structure of the network is not at all modular. Rather, the network will be a dense mess with no visually discernible structure. The network will look like ball of yarn ... or a hairball (above, right). In fact, this is precisely the type of structure which has recently been discovered in empirical investigations of a comprehensive set of large networks (social and otherwise) [2, 3].

So the question becomes: How do we find network communities in the hairball? This is the question YY, Jim and I answer in Ref [1]. The trick is that although nodes have many community memberships, each link is mostly uniquely defined. For example, the link you have to one coworker is similar to the link you have to other coworkers. Thus, by formulating community detection as a question of categorizing links rather than nodes, we are able to detect communities in networks with pervasive overlap.

Using our algorithm, for example, we show that dense hairball-networks, such as the word association network (which is what is pictured above, right) contain highly organized internal structure with well defined and pervasively overlapping communities. We're hoping that our algorithm will help reveal new insights about some of the many highly overlapping social networks, such as the LinkedIn data shown above.

Code for our algorithm may be downloaded here; that site also features a neat interactive visualization of the link clustering algorithm.

References

[1] Yong-Yeol Ahn, James P. Bagrow and Sune Lehmann. Link clustering reveals multiscale complexity in networkNature. doi:10.1038/nature09182 (2010).

[2] Santo Fortunato. Community detection in graphsPhysics Reports 486, 75-174 (2010).

[3] J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Statistical Properties of Community Structure in Large Social and Information Networks International World Wide Web Conference (WWW) (2008).

[4] J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined ClustersarXiv:0810.1355v1 (2008).

June 30, 2010 01:02 AM

June 29, 2010

Connected Action

Pierre De Vries Telco Industry Network Map featured in Journal of Social Structure (JOSS) (Made with NodeXL)

The Journal of Social Structure has released its First Annual JoSS Visualization Symposium results and two of the images were generated with NodeXL.  One of the two is The Evolution of FCC Lobbying Coalitions by Pierre de Vries, Research Fellow at the Economic Policy Research Center University of Washington, Seattle.

Pierre has been a deep student of telecommunications policy regulation in the United States for many years.  He has generated a remarkable network map built from the details of filings to the FCC over more than a decade.  These filings are made by companies when they agree or disagree with a proposed policy.  When two companies file in support (or opposition) to the same policy they create a tie between them.  The collection of these connections creates a complex network of coalitions and factions.

http://www.cmu.edu/joss/content/issues/2010jossviz/5_deVries.htm

“The graph is derived from meta-data associated with documents that are filed electronically whenever an organization interacts with the FCC, in accordance with the Administrative Procedures Act. Whenever a letter, comment or other document is filed, the filer provides information on the parties involved, number of pages, relevant proceedings, date, etc.”

“Once the data is cleaned up, an edge list is created in Excel by running another VBA macro. A graph is created from this list with NodeXL, a social network analysis and visualization add-in for Excel 2007. NodeXL’s Fruchterman-Reingold algorithm is used to prepare a preliminary layout; nodes are then moved by hand into visually intelligible positions, respecting the clusters suggested by NodeXL’s implementation of the Wakita-Tsurumi algorithm. Nodes are colored on the basis of eigenvector centrality. The degree of investment that organizations make in lobbying is measured by the total number of filings it made in this proceeding over the period of study, and reflected in the size of the node. This information is obtained by running another VBA macro against the underlying ECFS metadata, and then matching that to the vertices in the graph.”

Read more about this industry network at JoSS.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at June 29, 2010 10:19 PM

June 27, 2010

Data Mining: Text Mining, Visualization and Social Media

Visualizing the London Underground

A reader of this blog pointed me to this interesting project Matthew Somerville which displays the live position of trains on the London underground railway system. It is a fun visualization, and a great example of free data and dev ecosystems coming together. I wonder where this could go product-wise? Analytics for train problems? delays? Many of the tube lines have trains running at a frequency such that one just hops on the next one that arrives. I'm guessing that it might be useful in a late night, mobile scenario where optimizing dynamically over connections could make a big difference to getting around.

Liveunderground

by Matthew Hurst at June 27, 2010 10:04 PM

June 25, 2010

Data Mining: Text Mining, Visualization and Social Media

Visualizing the World Cup

There are a few visualizations of the World Cup knocking around.

As football, compared with many other games popular in the US, is often a low scoring affair, I'm interested in those visualizations that can help me understand if the result was due to consistent skill, or if it was rather a matter of luck (and poor refereeing ;-) With this in mind, I quite like the New York Times approach. The summary statistics can help get at the differences between the teams.

Italy

However, ultimately, this is not much more than a table of data points. I don't see anything in the timing that helps me understand anything other than Italy's desperation towards the end of the match.

I suspect that their hear map view of the location of play might be of interest if it could be anything more than the brief time slices that it currently provides.



 

by Matthew Hurst at June 25, 2010 03:02 AM

June 22, 2010

Complexity and Social Networks Blog

The emergence of international order: the case of MFN treaties in the 1860s

Below is an animation of the spread of MFN treaties in the 1860s. But let me provide a little background before you play it...

One of the major themes that runs through the study of international relations is that of hierarchy and hegemony; that the rules of the international system are determined by a hegemon or small coterie of dominant states, and that the study of the international system is really a story of a contest for hegemony (cf Kindleberger, Organski, Gilpin, among many others).

The relative free trade regime that emerged in the 1860s is often taken as a case study of the role of hierarchy in the international system, where, the story goes, the hegemon, Great Britain, imposed on the international order a set of rules that served its own interests in free trade. This is a perspective I critiqued in a paper in World Politics about a decade ago. In particular, I argued, the international order was emergent, with a foundation of a set of bilateral most favored nations, and the result of the interplay of domestic interests with the rapidly evolving international economy. Specifically, associated with the rise of industrialism were lower transportation costs and scale in production. The cost to producers of industrial and differentiated goods (but not homogeneous goods) of being discriminated against in another state's markets thus must have increased through the 19th century. This trend set the stage for an "epidemic" of most favored nation treaties, starting with a treaty between France and Britain in 1860. This treaty, I argue, created a concern by other industrializing countries that their goods would be shut out of France, decreasing the price they could receive for differentiated goods, and undermining the competitiveness of their industrial producers. These countries signed treaties with France, which then created additional concerns about being shut out of French (and other) markets, spurring yet more treaties. Britain, because it generally had low trade barriers already, was in a relatively peripheral position in this treaty network.

So: here is the animination (designed by Sune Lehmann). Key things to focus on include the temporal order of treaty signings, the role of geography in determining who signed treaties with whom, and the position of Great Britain in the emerging network.

Note: an edge indicates the presence of an MFN treaty between two countries, and node size is proportional to degree.



Here's the paper: Lazer_Free-Trade-Epidemic-1999_World-Politics.pdf

Here are the data: io3.xls

And some relevant references:

David Lazer, "The Free Trade Epidemic of the 1860s and Other Outbreaks of Economic Discrimination," World Politics, July 1999, 447-483.

Douglas Irwin, "Multilateral and Bilateral Trade Policies," in Jaime de Melo and Arvind Panagariya, eds., New Dimensions in Regional Integration (Cambridge: Cambridge University Press, 1993).

Peter Marsh, Bargaining on Europe: Britain and the First Approach to a European Economic Community, 1860-1892 (New Haven: Yale University Press, 1999).

June 22, 2010 01:16 PM

June 21, 2010

Data Mining: Text Mining, Visualization and Social Media

Groups A and C Getting Most World Cup Attention

A quick formulation of queries against BlogPulse ("world cup" AND (team1 OR team2 OR team3 OR team4) suggests that Groups A and C are getting the most attention from the blogosphere.

Worldcup1

by Matthew Hurst at June 21, 2010 12:34 AM

June 20, 2010

UMBC Ebiquity

Semantic overflow, a collaboratively edited question and answer site for the Semantic Web

Semantic Overflow is a great way for the Semantic Web community to help one another with questions, problems and education. It was started in November 2009 using the Stack Overflow framework hosted by Stackexchange.

It’s still building, with 261 questions submitted and just over 450 registered users, about a third of which have enough reputation to vote. Here’s an example: Ian Davis of Talis asked What is a good elevator pitch for Linked Data? and got 17 answers.



Screen shot 2010-06-20 at 11.42.19 AM

Like the parent stack overflow system, semantic overflow is a blend of a forum, wiki and recommendation site. It lets user ask, tag and answer questions, but also allows those with a sufficient reputation score to vote on and even edit both the questions and community submitted answers.

The tradition way of asking technical questions of a community is the mailing list or a Web based forum. The stack overflow model offers many advantages, so I hope this site continues gain traction.

If you want to monitor the site for new questions, you’ll find the feed of the 30 most recently submitted questions useful.

by Tim Finin at June 20, 2010 04:32 PM

June 19, 2010

Connected Action

Book: Flier and Cover Art – Analyzing social media networks with NodeXL: Insights from a connected world

The production team at Morgan-Kaufmann have created a cover and a flier for the forthcoming book:

2010 – June – NodeXL Book Flyer.

Written and edited by Derek Hansen, Ben Shneiderman and Marc Smith, the book contains contributed chapters on sample social media systems:

[Chapter 10]: Twitter: Conversation, Entertainment and Information, All in One Network!

By Vladimir Barash and Scott Golder

[Chapter 11]: Visualizing and Interpreting Facebook Networks

By Bernie Hogan

[Chapter 12]: WWW Hyperlink Networks

By Robert Ackland

[Chapter 13]: Flickr: Linking People, Photos, and Tags

By Eduarda Mendes Rodrigues and Natasa Milic-Frayling

[Chapter 14]: YouTube: Contrasting Patterns of Interaction and Prominence

By Dana Rotman and Jennifer Golbeck

[Chapter 15]: Wiki Networks: Networks of Creativity and Collaboration

By Howard T Welser, Patrick Underwood, Dan Cosley, Derek Hansen, and Laura Black

This handy poster contains many details about the book contributors, chapters, and the book cover (which you can also see below):

2010 - Book - Analyzing Social Media Networks with NodeXL Cover

Analyzing Social Media Networks with NodeXL: Insights from a Connected World

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at June 19, 2010 06:13 PM

June 17, 2010

Data Mining: Text Mining, Visualization and Social Media

Visualizing Wireless Signals in AR

I really enjoy the idea described in this film: being able to see parts of the real world not visible to the human eye through AR-like experiences.



 

Wireless in the world 2 from timo on Vimeo.



 

by Matthew Hurst at June 17, 2010 03:10 PM

June 16, 2010

UMBC Ebiquity

Infochimps provides API for their Twitter and Census datasets

Infochips now offers a query API for two interesting datasets: a Twitter collection and US Census data.

The Twitter data covers 500M tweets from 35M users collected between March 2006 and November 2009. The API currently included the following services.

  • Trstrank – a trust metric for Twitter users based on network centrality (see trst.me:

    http://api.infochimps.com/soc/net/tw/trstrank.json?screen_name=SarahPalinUSA

  • Wordbag – returns the 100 tokens (i.e., words) that a particular Twitter user tweets more often than the average Twitter user.

    http://api.infochimps.com/soc/net/tw/wordbag.json?screen_name=ladygaga

  • Influencer metrics – replies in/out and retweets in/out for a given user

    http://api.infochimps.com/soc/net/tw/influence.json?screen_name=algore

  • Conversations – find interactions between two users. Currently this just yields direct messages but will include retweets and mentions later. For example, check out conversations between Lady Gaga and Sarah Palin:

    http://api.infochimps.com/soc/net/tw/conversation.json?user_a_id=14230524&user_b_id=65493023

Pricing varies with use and ranges from Baboon” (free for 100K calls/month) to “Golden Ape” ($4000/month for 15M call/month).

by Tim Finin at June 16, 2010 03:14 AM