News aggregator

Let Them Print Cake

CCI Bloggers - 17 hours 46 min ago

There’s much food for thought in this post from Mike Wesch about how affordable 3D printing might influence the way we construct our identities. The video “Why I Love My 3D printer” is also pure gold.

The governance of sustainability: how companies manage their corporate responsibilities

Creative Economy News and Research - 3 February 2012 - 3:29pm
Creator:  Alice Klettner

This study reviews the structure of corporate responsibility in a sample of twelve large, listed Australian companies.

read more

Targeting support for high needs students in primary schools

Creative Economy News and Research - 3 February 2012 - 12:44pm
Creator:  Max Angus Creator:  Harriet Olney

Australian governments are adding resources to schools to achieve better educational outcomes through a series of National Education Partnerships between the Australian Government and state governments.

read more

The State of Marketing 2011: Unica’s Annual Survey of Marketers

Creative Economy News and Research - 2 February 2012 - 2:56pm

The opportunity to understand what marketers are thinking and doing during a period of unprecedented change and increasing expectations has made Unica’s Annual Survey of Marketers a must read.

read more

CCI Report highlights role of social media in floods coverage and response

Creative Economy News and Research - 1 February 2012 - 4:02pm
Creator:  Kate Crawford Creator:  Jean Burgess Creator:  Frances Shaw Creator:  Axel Bruns

Social media sites Twitter and Facebook played a crucial role in disseminating information during the 2011 Queensland floods.

read more

Boost for Australian content and innovation

Creative Economy News and Research - 1 February 2012 - 3:53pm

Australian content should receive a boost in an increasingly convergent media world, if the findings of the Federal Government’s interim report – released today – are carried into policy.

read more

More Twitter Metrics: Metrify Revisited

CCI Bloggers - 31 January 2012 - 6:04pm

About a month ago I introduced my new Gawk script metrify.awk, which generates a wide range of Twitter metrics for a given Twapperkeeper/yourTwapperkeeper hashtag or keyword archive. Even as I was writing those posts, though – and certainly while playing with the language metrics I discussed in my last post –, I started to find a few areas where metrify could provide even more information on the dataset. So, the time has come for a first service release which upgrades metrify.awk to add some more functionality (and fix a few inconsistencies along the way). This is a revision rather than a full rewrite of the script, so let’s call it metrify 1.2; it’s now available for download here, where it replaces the older version.

As before, the new version of metrify.awk is called as follows:

gawk -F , -f metrify.awk time=”[year|month|day|hour|minute]” [divisions=x,y,z,…] [skipusers=1] input.csv >metrics.csv

(divisions defaults to ‘90,99’ – i.e. a 90%/9%/1% split of the userbase – if it is not specified).

In this post, I won’t go from scratch through the entire range of metrics that metrify.awk generates; my original four-part post is still sufficient for that purpose. Rather, I’ll focus only on the major changes in this new revision, which relate mainly to part two of that series (and I’ve noted the updates in those posts as well, to avoid confusion): the metrics over time.

Changes to Metrics over Time

The first table generated by metrify shows the metrics over the chosen timeframe (e.g. day or hour), but it now contains a number of additional data points. The changes only concern the columns which contain metrics for the various user percentiles which are defined with the ‘divisions’ argument. Rather than providing information only on the number of users from each percentile which are actively participating during each timeframe (expressed as a percentage of the total number of currently active users), as metrify 1.0 did, revision 1.2 provides a number of further metrics:

  • the number of users from each percentile which are currently active, and what percentage of the total currently active userbase that number represents;
  • the number of tweets from users in each percentile which were made during the timeframe, and what percentage of the total current volume of tweets that number represents.
     

Here’s a comparison of the relevant output columns between versions 1.0 and 1.2:

metrify.awk 1.0 metrify.awk 1.2   number of current users from least active x% (< u tweets) lowest x% users (<= u tweets) % of current users from least active x% (< u tweets)   number of tweets from least active x% (< u tweets)   % of tweets from least active x% (< u tweets)       number of current users from > x% group (> u-1 tweets; a of n users) users > x% (> u tweets; a of n users) % of current users from > x% group (> u-1 tweets; a of n users)   tweets from > x% group (> u-1 tweets; a of n users)   % of tweets from > x% group (> u-1 tweets; a of n users)       number of current users from > y% group (> v tweets; b of n users) users > y% (> v tweets; b of n users) % of current users from > y% group (> v tweets; b of n users)   tweets from > y% group (> v tweets; b of n users)   % of tweets from > y% group (> v tweets; b of n users)

  
(with the default settings, x% would be 90% and y% would be 99%; a, b, u, v, and n would depend on the dataset).

So, it now becomes possible not only to track what percentage of the total number of currently active users are from each of the percentiles we have defined, but also what percentage of the total volume of tweets during each period is contributed by each of the user percentiles. By way of example, here’s a comparison of those metrics for the #egypt dataset during February 2011:

 
Active users in the 90/9/1 user percentiles as percentage of total active userbase


Tweets by users in the 90/9/1 user percentiles as percentage of total current tweet volume

Unsurprisingly, the two charts move together – the greater the presence of a specific user group in the total active userbase, the greater their contribution to the current tweet volume – but only the second chart also tells the story of just how dominant the most active one per cent of users really is. Towards the end, they still only constitute slightly less than 20% of the total userbase participating during the final days of February – but more than half of all tweets posted at that time originate from them.

(At a later stage, I may also add functionality to track the use of different tweet types over time, by the different percentiles – but that’s a feature for metrify 1.5 or so.)

Other Changes

The only other notable change in this new revision is that the third of the tables generated by metrify.awk, which describes the participating users themselves, has gained a further column, ‘percentile’. This contains a simple descriptor of which of the various percentiles a user has been placed in, and thereby allows for an easier filtering of the list (using Excel’s data filter functions). For the standard 90/9/1 division of the userbase, fields in the column would contain one of the following four options for each user:

  • > 99% – user belongs to the top 1% of most active users
  • > 90% – user belongs to the top 10% of most active users, but is outside the top 1%
  • > 0% – user belongs to the 90% of least active users
  • none – user appears only in @reply or retweet mentions by others, but does not actively contribute to the hashtag

 
Additionally, and less obviously, I’ve also rewired how users are tracked through the dataset. In principle, this should be a very simple process: each user has both a unique numerical Twitter user ID, and a unique alphanumeric username. However, for some esoteric reason the user IDs returned by the Twitter search and streaming APIs, which Twapperkeeper uses to retrieve its datasets, do not always match, especially for older archives (or perhaps for older accounts?); the same user may have two completely different user IDs (thanks for John O’Brien for the details on this). This means that using the user IDs to track user activities in the dataset is unreliable. Usernames, however, may also be changed by the user at any point – @KRuddMP could become @KRuddPM when you least expect it. (Sorry, couldn’t resist!)

Still, as this doesn’t happen all too often, and given the unreliability of the numerical user IDs, metrify does use (lowercase) usernames as its internal tracking ID. The final output itself shows usernames in their properly capitalised form as we’ve first encountered it in tweets by the users themselves (they may also have chosen to change that capitalisation at a later date, though; we’re not checking for that), wherever possible; for users who are only mentioned, but don’t themselves tweet actively, we use the capitalisation which we first encounter.

Finally, one caveat remains: as before, metrify will take quite some time to process a large dataset, and is likely to run out of memory if it’s trying to generate full user metrics for such datasets. (There doesn’t seem to be any way to allocate more memory to Gawk – or to the shell it runs in –, so there’s little I can do to fix this.) Where full, detailed per-user metrics aren’t required, use the skipusers=1 command-line argument, and Gawk will only output the number of tweets contributed by each user, and the percentile they’ve been allocated to on that basis. And it will take a lot less time to do so.

So much, then, for this service update of metrify.awk. In a follow-up post in a few days, I’ll show how metrify metrics can also be imported into Gephi to turbo-charge our network visualisations of Twitter @reply and retweet networks…

Law must navigate the seas of social media

Creative Economy News and Research - 31 January 2012 - 2:57pm

We must move past the culture of blame, argues Chris Harrison for The Punch.

read more

A fair day’s pay for a fair day’s play

Creative Economy News and Research - 31 January 2012 - 2:24pm

The women’s tournament is not equal in work to the men’s, writes Mark Gottlieb for The Punch.

read more

Students as learning designers: using social media to scaffold the experience

Creative Economy News and Research - 31 January 2012 - 1:16pm
Creator:  Miriam Tanti Creator:  Leanne Cameron

The ‘students as learning designers’ approach challenges transmission models of pedagogy and requires teachers to relinquish some control to their students so that they might have the space to experiment and discover how to learn.

read more

Digital futures for cultural and media studies

Creative Economy News and Research - 31 January 2012 - 11:03am

As they play with their digital devices and online games, children may unknowingly be making up the kind of democracy we will have tomorrow.

read more

Privacy in networked places - a talk by Danah Boyd

Creative Economy News and Research - 31 January 2012 - 10:36am

There is a widespread myth that young people don’t care about privacy. Embedded in this myth is an assumption that participation in public social media like Facebook and Twitter indicates a rejection of privacy. Yet, just because people want to participate in public life doesn’t mean that they want everything they do to go down on their permanent record or to be publicized for the whole world to see. This talk will examine how young people understand privacy and the strategies they take to achieve privacy in networked publics.

read more

The truth about privacy

Creative Economy News and Research - 31 January 2012 - 10:15am

As technology makes our world more transparent, handeling both customer data is both a riskand an opportunity for businesses.

read more

Mapping Australian higher education

Creative Economy News and Research - 30 January 2012 - 10:04pm
Creator:  Andrew Norton

Australia’s higher education system is entering one of its most significant years in recent history.

read more

Call for Papers: Emerging Methods for Digital Media Research

CCI Bloggers - 30 January 2012 - 8:40pm

Another brief announcement: along with our CCI colleague Larissa Hjorth, Axel and I are looking forward to editing a special issue of the Journal of Broadcasting & Electronic Media (JOBEM) on the theme “Emerging Methods for Digital Media Research”, due for publication in March 2013. If you work in a related area, please consider submitting an abstract by the March deadline. Details follow below.

Emerging Methods for Digital Media Research
Special Themed Issue of the Journal of Broadcasting & Electronic Media (JOBEM), March 2013.

Guest Editors:
Jean Burgess (QUT)
Axel Bruns (QUT)
Larissa Hjorth (RMIT)
ARC Centre of Excellence for Creative Industries & Innovation (http://cci.edu.au/)

Editor: Zizi Papacharissi

With the rise of ‘big data’, locative media, and smartphones, existing media and communication studies methods are being recombined, reconfigured and replaced alongside their objects of study. This special issue of JOBEM seeks to expose new research methods for understanding the changing nature of the content industries, the impact of digital media on the practices of creative workers, and the experiences and practices of everyday users of digital media technologies.

We welcome papers based in the humanities and social sciences that reflect on, discuss or critique current methodological trends in digital media research, shedding light on the following questions:
1. Where are the emerging methodological gaps – are there pressing research problems that require the development of new methods, techniques and tools?
2. Where are there needs for new combinations of methods, within or across disciplines?
3. What are the implications for future pedagogical models in internet, media and communication studies, including doctoral education and other forms of research training?

We especially welcome papers grounded in the experience of conducting empirical digital media research. However we will give preference to papers that contextualise, historicise, and reflect on current methodological trends; rather than simply report on the applications or results of new methods.

Abstracts of 250 words are due by 31 March, 2012. Depending on the number of abstracts received, we may shortlist submissions at this stage. Please email your abstract and a list of 3 or 4 suggested peer reviewers to: jobem.edm@gmail.com.

Full articles of no more than 7000 words should be submitted on or before 1 August, 2012 at: http://mc.manuscriptcentral.com/hbem (select “Special Issue: Emerging Digital Methods” as a manuscript type). Manuscripts should conform to the guidelines of the Journal of Broadcasting & Electronic Media.

Creating Basic Twitter Language Metrics

CCI Bloggers - 28 January 2012 - 12:27pm

OK, this may be a somewhat esoteric subject for researchers who mainly work with Twitter data from specific countries and cultures, but over the past few weeks I’ve been working on a paper that analyses Twitter activities in the #egypt and #libya hashtags – and as part of that work, I’ve been interested in exploring the interactions between users tweeting in Arabic and users tweeting in other languages (mainly in English). Unfortunately, there’s no reliable means of identifying the language of specific tweets, or of the users who post them; while the Twitter API provides an ISO language code (e.g. ‘en’ for English, ‘no’ for Norwegian, etc.) for each tweet, this is drawn simply from the overall language setting of the user’s account, and not specific to each individual tweet itself. For users who alternate between languages in their tweeting, all tweets will be tagged with their chosen language code; for users who haven’t bothered to change their Twitter profile settings away from the default English, all their tweets will be tagged ‘en’, regardless of their actual language.

So far, so unhelpful. Further, short of running every tweet through some form of automatic language recognition tool (using Google Translate or a similar mechanism, for example) – which would be extremely time-consuming for Twitter archives upwards of a few thousand tweets – it is prohibitively difficult to identify the exact language of each tweet, not least also because of the 140 character limit of tweets. In theory, if we had word corpora for all major languages, we could cross-check each tweet against those corpora to see what words from what language occur most frequently – but again, that process would be extremely time-consuming, and would probably have serious difficulties with the abbreviations and contractions which Twitter users commonly employ to stay within that limit.

A much simpler approach – which does generate somewhat less conclusive results, though – works by examining the character sets used in tweets. This is able to make only relatively broad distinctions, but it’s good enough for what I’m trying to achieve with my #egypt/#libya datasets: here, a quick qualitative look at the data suggests that the major division is between Arabic tweets and tweets in English (and to some extent in other European languages) – so the main challenge is to distinguish between Latin and Arabic character sets. This we can do, even just with a basic Gawk script.

Twitter datasets as they are generated by our standard hashtag tracking solution, yourTwapperkeeper, are available in UTF-8 encoding, leaving virtually all characters and character sets intact. Each character is assigned a specific character code, and for historical reasons, the basic characters of the Latin script (unaccented letters, standard punctuation marks, etc.) retain their traditional ASCII codes, with values below 128; beyond that range, we’re moving into accented letters, more unusual punctuation marks, and non-Latin character sets. Sadly, our preferred tool for processing yourTwapperkeeper datasets, Gawk, doesn’t cope all that well with advanced UTF-8 characters – it copes fine with single-byte character codes (i.e. below 256), but not with multi-byte character codes (above 255; it reads these as multiple single-byte characters). At least on a Windows PC, there doesn’t seem to be any way to change that behaviour, either.

However, that’s still good enough for our immediate purpose of distinguishing between Latin and non-Latin (i.e. mainly English and Arabic) tweets. As it turns out, Gawk consistently sees Arabic characters as a sequence of two codes: of either 216 (Ø) or 217 (Ù), followed by another character with a code above 127. So, for a basic distinction between tweets using Latin and tweets using non-Latin scripts, we simply need to count the number of high-ASCII characters (with a code above 127) which Gawk sees in each tweet, and to set a threshold below which a tweet is still classified as ‘Latin’ (to allow tweets that use accented characters or ‘fancy’ quotation marks to be classed as Latin). Through trial and error, I’ve found that a threshold of 20 (i.e. ten Arabic or other non-Latin characters) seems to work reasonably well: few tweets in languages using the Latin alphabet will be miscounted as ‘non-Latin’, even if they contain a number of umlauts or accented characters, while tweets in Arabic, Hebrew, Greek, Chinese, Korean, and other non-Latin alphabets are reliably recognised.

We could use this to mark up the language of every line in a yourTwapperkeeper archive – but that’s not necessarily very useful or interesting. Instead, the script below operates on a user-by-user basis: for each user, it counts the number of their tweets which were above the ‘non-Latin’ threshold, and also calculates a language_ratio value: the percentage of their tweets which used non-Latin characters. The script accepts an optional ‘tolerance’ parameter, to set the ‘non-Latin’ threshold: a typical way to use it would be

gawk -F , -f userlanguage.awk tolerance=20 input.csv >output.csv

(tolerance defaults to zero if it isn’t set).

# userlanguage.awk - Extract stats on the language use of each user, as metrics for network visualisation in Gephi # # this script takes a Twapperkeeper CSV/TSV archive of tweets, and calculates for each user a ratio # indicating how many of their tweets were in non-Latin charactersets # # output is in a format ready to be imported as a node list into the Gephi Data Laboratory # on import, note that new data columns must be imported as 'float' type # # the script skips the first line, expecting that it contains header information # # script expects an optional numerical "tolerance" parameter, to set how many high-ASCII (non-Latin) characters a tweet may contain while still counted as Latin script # set tolerance to ~20 to treat most accented European languages as Latin (note that Gawk will count some UTF-8 characters as two or more high-ASCII characters) # default value for tolerance is 0 # # expected data format: # text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type,geo_coordinates_0,geo_coordinates_1,created_at,time # # output format: # nodes,id,label,user_tweets,user_highASCII_tweets,language_ratio # (language_ratio is a value between 1 = no Latin tweets and 0 = 100% Latin tweets) # # Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au BEGIN { getline if(!tolerance) tolerance = 0; # highASCII tolerance level: default 0 for(char = 0; char < 256; char++) { charnum[sprintf("%c", char)] = char } print "Nodes" FS "Id" FS "Label" FS "user_tweets" FS "user_highASCII_tweets" FS "language_ratio" } { nodename[tolower($3)] = $3 node[tolower($3),"tweets"]++ highASCII = 0 for(char = 1; char<=length($1); char++) { if(charnum[substr($1, char, 1)] > 127) highASCII++ # count number of high ASCII (>127) characters in tweet; note: some UTF-8 characters count as multiples } if(highASCII > tolerance) node[tolower($3),"highASCII"]++ } END { for(name in nodename) { print name FS name FS nodename[name] FS node[name,"tweets"] FS node[name,"highASCII"] FS node[name,"highASCII"] / node[name,"tweets"] } }


The resulting data can be used in a number of ways. For one, we might divide the total userbase into three groups: users who mainly used Latin characters (with a language_ratio below 0.33); users who mainly used non-Latin characters (language_ratio > 0.66); and users posting in a mix of languages (language_ratio between 0.33 and 0.66). If we further combine this grouping with the distinctions between lead users, highly active users, and less active users which the metrify.awk script makes possible, we now have the ability to examine the prevalence of different languages across these different groups – for #egypt during February 2011, this is what results, for example:

An interesting result: while ‘Latin’ (in this case, mainly English-speaking) users dominate overall, they’re mainly found amongst the less engaged 90% of users – they’re making or retweeting a small number of hashtagged comments about the situation in Egypt during February. The most engaged one per cent of users contain a much larger percentage of Arabic (i.e. non-Latin) speakers, as well as a sizeable proportion of users tweeting in a mix of languages and character sets.

(Note: of course, speakers of languages such as Chinese, Korean, Japanese, Greek, Hebrew, Russian, etc. will be included in the ‘non-Latin’ group here, and speakers of many European languages other than English will be counted amongst the ‘Latin’ group. In many cases, this will be a problem, and our approach here doesn’t allow for easy distinctions between, say, English and French, or Arabic and Hebrew. For our present purposes, however, that’s a negligible problem – few ‘non-Latin’ languages other than Arabic, and few ‘Latin’ languages other than English, are present in the #egypt dataset to any significant extent.)

Additionally, the output of userlanguage.awk is also designed to be easily imported into Gephi as an additional source of data on the users in the network. Assuming we’ve already created a network (for example showing @replies and retweets) for your dataset, using the Twitter usernames (normalised to lower case) as node IDs, we can now use the Data Laboratory to import the language data into the nodes table, as additional columns. Here, it’s important to make sure the numerical metrics generated by userlanguage.awk (user_tweets, user_highASCII_tweets, language_ratio) are imported as columns of the ‘Float’ type, in order to be able to use them effectively in Gephi.

(I’ll say much more about importing Twitter metrics data into Gephi in a future blog post – stay tuned.)

Once imported, these metrics are now available to be used for various purposes: as a means of sizing or colouring nodes in the network, or as criteria for filtering it. To finish off for now, here’s a simple example, which shows @replies and retweets in the #egypt hashtag during February 2011. I’ve used the language_ratio value as the guide for the colour scale here: blue indicates a language_ratio close to zero (predominantly tweeting in Latin characters); green a language_ratio close to one (predominantly tweeting in non-Latin characters); with a gradient of colours between them. Connections between users are coloured according to the language ratio of the sender. (Full graph here – PNG, 9 MB.)

 
There’s an obvious language divide here – English- and Arabic-speaking users are mainly tweeting amongst themselves. But there are also a good number of connections across the divide – and for these, given the graph above, the most active #egypt participants are disproportionately responsible: mixed-language users are much more likely to be found in that group than in any of the others.

And that’s it for now – more on my language analysis of #egypt and #libya when the paper gets published, and more on using Twitter metrics in Gephi in a future post!

Broadcasting Services Amendment (Regional Commercial Radio) Bill 2011

Creative Economy News and Research - 27 January 2012 - 3:30pm
Creator:  Rhonda Jolly

This proposes of this Bill is to digest and amend the Broadcasting Services Act 1992 (the BSA) with the aim of easing the regulatory burden on regional commercial radio broadcasters.

read more

Preliminary musings on reality TV and lifestyle culture in Mumbai

CCI Bloggers - 27 January 2012 - 2:06pm

Tania Lewis, a Chief Investigator on the ARC discovery project ‘The role of lifestyle television in transforming culture, citizenship and selfhood: China, Taiwan, Singapore and India’ recently returned from Mumbai, the home of Bollywood and a major entertainment TV hub, where she has been conducting household interviews with research associate Kiran Mulenhalli, extending on previous interviews she has done with lifestyle and reality TV producers in Delhi and Mumbai. In November and December 2011, Kiran and Tania conducted 18 audience interviews, primarily with households but also with individuals, from a broad range of economic, occupational, ethnic, religious and class backgrounds (though mainly ‘middle class’), and including a wide range of ages (family interviews often included children and also grandparents).

Alok's family

In the household interviews we often spent two or three hours in the homes of the participating families, watching and talking about television together, sharing a meal with the families and discussing their lifestyles and consumer practices. We were interested in seeing how television viewing (particular of reality and lifestyle formats) fitted in with, reflected and influenced their broader lifestyles and their values. Most families watched a range of reality and lifestyle advice shows from Big Boss (the Indian version of Big Brother) to spiritual advice shows (eg yoga gurus on morning television). Some of the more popular shows currently on in India (and that we discussed with the families) include the Indian version of Who Wants to Be a Millionaire, which was widely praised by informants for its educational dimensions and for its positive portrayal of ‘the poor’ (Sushil Kumar, a recent winner on the show, for instance, was from a poor family from Bihar and has since gone on to be an ambassador for a Government programme, the Mahatma Gandhi National Rural Employment Guarantee Act, which aims to support poor rural families in the north). We also, somewhat ironically, watched MasterChef Australia on the large digital flat screen TV of one wealthy Muslim family who described it as one of their favourite shows, pointing to the growing popularity of cosmopolitan tastes around food and fashion more broadly among the Indian middle class.

Sushila's family

Such trends might suggest a globalization and homogenization of Indian lifestyles and consumption, and certainly many of the families spoke of the rapidly growing consumer culture that had engulfed India over the past decade and the growing role of western/global brands in people’s everyday lives from the ubiquitous (and very cheap) Maggi noodles, consumed it seems by everyone, to Nivea face whitening products, used widely by both men and women. While these kinds of consumption practices can, on the surface be seen as ‘western’ they are also Indianised in all kinds of ways and our informants often discussed their lifestyles and consumption as being influenced by a mix of Indian and ‘western’ traditions. And indeed the settings in which we conducted the interviews often reflected this, with television sets placed next to religious shrines and furnishings displaying a hybrid mix of global and local styles. And while reality TV is very popular in Mumbai, one of the most popular shows, that many families mentioned (and that we watched with a few families in situ), was a down home variety-style/ advice show starring a Gujarati actor/TV personality, probably little known outside of north-west India—reflecting the ongoing potency of highly localized televisual and lifestyle cultures even in cosmopolitan Mumbai.

TCF Australian nonwovens manufacturing technology roadmap

Creative Economy News and Research - 27 January 2012 - 1:14pm
Creator:  Bill Humphries

This report is an industry aligned technology roadmap of the TCF Nonwoven Textiles field. It provides detail to help ensure the industry has the technical capability to be globally competitive.

read more