Topics in Theory
After experimenting with topic models of Critical Inquiry, I thought it would be interesting to collect several of the theoretical journals that JSTOR has in their collection and run the model on a bigger collection with more topics to see how the algorithm would chart developments in theory.
I downloaded all of the articles (word-frequency data for each article, that is) in New Literary History, Critical Inquiry, boundary 2, Diacritics, Cultural Critique, and Social Text. I then ran a model fitted to one-hundred topics. I had to adjust the stop-word list to account for common words and, unsuccessfully, for words in other languages. What I should have done was use the supplied stop-word lists in those languages as well. At least this way there is a chance that interesting words in those languages will cluster together.
The topics themselves looked good, I thought. One hundred was about the right number, as I didn’t see much evidence of merging or splitting. I should say rather that I saw an acceptable level, or the usual level. This topic, for example, shows what I mean: “aboriginal rap[?] women australian climate weather movement work warming time australia housework change social power oroonoko[?] make wages years.” I also didn’t lemmatize this corpus, although I know how to. Lemmatizing takes a lot of time the way I’m doing it (using the WordNet plugin of the python Natural Language Toolkit). And I frankly haven’t been that impressed with the specificity of the lemmatized models that I have run.
Visualizing changes in topics over time is quite difficult. Each year
will have thousands of observations per topic and taking the mean of
each topic per year doesn’t always produce very readable results.
Benjamin Schmidt
trying the geom_smooth
function of ggplot2
, which I never had much
luck with. The main reason, I found, that I couldn’t get it to work very
well is that I was trying to create a composite graph of every topic
using facet_wrap
. Each topic graphed by itself with geom_smooth
produced better results.
Here, for example, is the graph for this coherent topic—“gay sexual
queer sex lesbian aids sexuality homosexual men homosexuality identity
heterosexual male gender desire social lesbians drag butler”:
The chronology you see above does approximately track the rise of queer
theory, though the smoothing algorithm is full of mystery and error. A
scatter-plot of the same graph would be far noisier and also not reveal
much in the way of change over time. This topic should also correlate
somewhat roughly to postcolonial theory–“indian india hindu colonial
postcolonial subaltern british indians nationalist gandhi english
bengali religious caste nationalism sanskrit maori bengal west”:
I’m suspicious of this linear increase, needless to say. The underlying data is messier. Would Marxist theory show any decline around the predictable historical period? (Terms: “social class theory ideology political production ideological historical marxist marx bourgeois capitalist society capitalism marxism economic labor relations capital”)
That is roughly what I was expecting. But compare “soviet party revolutionary socialist revolution socialism communist political national left union struggle europe russian fascism war central movement european”:
I have hopes for the exploratory potential of topic-modeling
disciplinary change this way. Another interesting topic that shows a
linear-seeming increase (“muslim islamic islam religious arab muslims
secular arabic algerian orientalism rushdie religion iranian iran
western turkish ibn secularism algeria”):
To show what the data looks like with different visualizations, I’m
going to cycle through several types of graphs of the above topic. The
first is a line graph:
Next is a scatter-plot:
Now a scatter-plot with the scale_y_log10
function applied:
All of the graphs reveal a general upward trend, I think, though not as much as the smoothing function does. I would be delighted in hearing any ideas anyone has about better ways to graph these. I’ve not found any improvements in grouping by document rather than year.
There’s more I plan to do with this data set, including coming up with better ways to visualize it (more precision, efficient ways of seeing many at once, etc.) I am including the full list of topics after the fold for reference. Some reveal OCR errors; others are publishing artifacts that my first rounds of stopping didn’t yet remove.
Update (2/14/12): I created a browser of this model that shows the articles most closely associated with each topic.
0 0.0321 american left radical political movement social economics black time years war power orwell decade began america back students books 1 0.02376 chinese china western hong cultural kong wang boundary modern zhang west mao lu intellectual shanghai intellectuals japanese east liu 2 0.09169 french france paris pierre jean barthes bataille flaubert work proust sartre jacques louis revolution text marcel madame histoire georges 3 0.03662 movement movements left political radical american revolution cultural world aronowitz issue civil change issues society sexual freedom social history 4 0.18127 language speech words word linguistic translation english voice meaning discourse speaking speaker act speak sentence utterance languages spoken verbal 5 0.02654 woolf virginia jane beckett lentricchia gilbert austen lawrence moore richards eve forster adam room samson edna shaw bloomsbury stevens 6 0.03491 asian american united ethnic pacific states immigrant immigrants immigration racial transnational border korean diaspora hawaiian mexican chicano identity diasporic 7 0.68931 power suggests figure authority text version rhetoric appears makes irony offers terms force calls rhetorical cited earlier act ironic 8 0.03294 latin spanish cuban don cuba puerto juan borges spain america mexico mexican brazilian rican american jose brazil maria garcia 9 0.0539 medieval middle oral latin literary ages ancient tradition written auerbach century classical texts renaissance modern rhetoric augustine early philology 10 0.12145 south national state political government people rights nation community local international official africa human african policy population land resistance 11 0.04518 medical health body medicine disease aids drug illness mental patients treatment patient clinical healing hysteria madness addiction bodies coffee 12 0.05613 german das den germany als berlin ist kafka benjamin ein karl eine trans ich mit dem ernst friedrich sich 13 0.15228 social class theory ideology political production ideological historical marxist marx bourgeois capitalist society capitalism marxism economic labor relations capital 14 0.18724 philosophy theory philosophical knowledge truth thought science scientific world wittgenstein epistemological human idea philosophers view language theoretical reason empirical 15 0.21737 university trans john david duke cambridge boundary chicago michael robert harvard oxford richard paul modern james minnesota princeton peter 16 0.0917 aesthetic art benjamin adorno sublime aesthetics work kant experience critique object beautiful form modern beauty concept judgment modernity trans 17 0.12624 narrative story narrator narratives events stories narration time plot tale event voice literary telling structure discourse action account history 18 0.20746 history historical past time present historians historian future modern events period histories century human historiography temporal study change historicism 19 0.0124 aboriginal rap women australian climate weather movement work warming time australia housework change social power oroonoko make wages years 20 0.02402 jewish palestinian arab israel israeli jews palestinians palestine zionist jew arabs state zionism middle west land political east hebrew 21 0.56336 form work general structure individual forms elements analysis principle formal specific works terms single set style function features type 22 0.16719 public york media television news national american times show president united march recent audience private april states people campaign 23 0.11547 identity postmodern cultural politics postmodernism difference discourse social culture power dominant practices identities world resistance discourses history struggle language 24 0.41333 critical studies work cultural theory critique political contemporary essay recent intellectual theoretical ways historical questions analysis question practice discussion 25 0.06531 japanese money economic market economy exchange japan corporate commodity capital financial business capitalism consumption consumer commodities production economics wealth 26 0.09897 sexual women male female woman men body gender sex feminine sexuality desire masculine power man difference masculinity pleasure bodies 27 0.06501 women feminist feminism feminists gender female woman male men sexual patriarchal sex work politics political radical mary history movement 28 0.66109 argument make position evidence response good view critics find simply claim general values arguments claims problem easily issue difficult 29 0.07487 poem romantic poetry poet wordsworth poetic milton poems coleridge william mind nature yeats paradise poets blake thy shelley bloom 30 0.01992 indian india hindu colonial postcolonial subaltern british indians nationalist gandhi english bengali religious caste nationalism sanskrit maori bengal west 31 0.08238 fig photography photograph photographs figure photographic museum portrait picture objects visual pictures image images camera object medium display portraits 32 0.1694 moral human ethical freedom life ethics individual action good morality values person social nature actions reason responsibility man judgment 33 0.20092 literary literature criticism critical history critics theory critic works study language english art tradition texts modern aesthetic writers essays 34 0.14431 writing book writer life writers write read reading written books work autobiography literary autobiographical literature personal wrote style reader 35 0.01414 chomsky dewey ek war read goodman american politics marcuse movement work political state left social radical approach public life 36 0.05149 literary history cited univ text cal notes ity dis ence human ness pro sion tional term form inter comparison 37 0.06194 music musical sound jazz song dance performance sounds voice musicians songs listening play hear art playing singing radio recording 38 0.1099 american america united states national americans war john world william james henry culture north cultural cold history canadian melville 39 0.03316 gay sexual queer sex lesbian aids sexuality homosexual men homosexuality identity heterosexual male gender desire social lesbians drag butler 40 0.17065 death body violence human dead life animal living bodies man fear pain kill blood horror murder animals scene violent 41 0.09262 city space urban spatial building place spaces cities architecture site home center house landscape architectural public places built map 42 0.02614 williams james brown tom fuller eliot maggie american john bishop read margaret people smithson act charlotte book bowl robert 43 0.05497 foucault life power deleuze everyday michel modern sovereignty state political agamben trans sovereign body guattari human discipline politics disciplinary 44 0.18385 love father family marriage woman life man young mother wife story home house husband daughter desire lady women scene 45 0.69503 time order question place point longer end moment truth means present fact back precisely word speak give remains beginning 46 0.30084 water earth land sea sun place green landscape tree river sky trees snow space light stone red white high 47 0.10127 law legal rights state court justice property laws case act authority decision system rule states sovereign contract rules sovereignty 48 0.49656 desire moment return loss form death lost presence absence condition remains sense figure passage identity past end crisis sign 49 0.16624 system time systems theory information communication cognitive affect body processes affective human process temporal level space perception brain distinction 50 0.42525 world experience life human reality consciousness nature process mind imagination vision sense language individual personal meaning man form act 51 0.40925 book published years work text letter written early author title letters books publication read readers english number found wrote 52 0.06248 technology information media computer technological digital technologies electronic machine communication human control world technical data virtual machines web internet 53 0.06998 police crime trial violence criminal prison case murder legal evidence crimes political violent victim victims serial secret justice eichmann 54 0.3546 subject discourse order relation space form discursive object difference practice subjectivity logic place symbolic subjects production position mode boundary 55 0.02432 emerson ellison burke twain hawthorne trilling read invisible ralph huckleberry writers work finn jim black social literature women john 56 0.03159 renaissance pastoral king court courtly queen literary english greenblatt prince sidney elizabethan good marie royal sir henry knight text 57 0.09066 freud psychoanalysis psychoanalytic desire unconscious lacan theory object subject freudian ego psychic symbolic sexual pleasure psychological dream psychology fantasy 58 0.57855 great modern century man life made time age men history years early thought intellectual long world found nineteenth period 59 0.14516 labor economic workers work class social working economy state welfare system industrial market percent capital control poor government union 60 0.1693 text reading literary interpretation meaning texts reader textual work author interpretive readers theory history intention read understanding act interpretations 61 0.02364 muslim islamic islam religious arab muslims secular arabic algerian orientalism rushdie religion iranian iran western turkish ibn secularism algeria 62 0.07184 play theater drama audience dramatic stage performance shakespeare plays theatrical action history hamlet tragedy characters comedy comic character actor 63 0.04363 memory trauma past holocaust memories traumatic event jews nazi history experience truth jewish testimony auschwitz german victims witness war 64 0.18222 image representation images visual space body gaze object vision presence mirror representations represented eye visible point perception represent picture 65 0.06115 film films cinema camera cinematic hollywood movie screen shot documentary frame scene movies early spectator visual images time sequence 66 0.13551 inquiry critical winter autumn abbreviated professor response spring account summer trans claim chicago fact theory point made convention essay 67 0.07161 black white african racial race blacks slave negro racism slavery bois color racist whiteness whites people social class blackness 68 0.11013 metaphor language meaning sign linguistic semiotic semantic signs system theory metaphors word metaphorical structure discourse level words literal semiotics 69 0.04175 ou comme nous mais tout sans akhmatova avec cette sont baudelaire ii fait ses ces aux lydia meme leur 70 0.04025 natural science mathematical scientific mathematics machine physics quantum hobbes human descartes nature mechanical machines body universe set history einstein 71 0.10857 god religious christian religion divine church christianity secular faith theological christ theology jesus sacred spiritual holy biblical tradition jewish 72 0.09419 war military vietnam world united states american nuclear enemy terror cold power peace soldiers torture wars army violence bush 73 0.09257 art painting work artist artistic artists works arts paintings visual aesthetic modernist modernism painter modern museum abstract fried style 74 0.06986 science human scientific nature natural scientists genetic species biological environmental research evolutionary sciences knowledge evolution biology humans life environment 75 0.0255 gramsci italian italy che prison antonio fascist ii notebooks croce giovanni vita canto carlo dante verdi michelangelo rome trans 76 0.21001 people time years good talk lot back things talking women wanted feel work make told interview thing men put 77 0.19357 political politics state public social liberal power democratic society democracy civil freedom sphere people discourse rights private radical intellectuals 78 0.10585 derrida text man reading deconstruction writing language texts jacques deconstructive miller play figure read difference question rhetoric paul essay 79 0.05332 greek tragedy classical tragic ancient oedipus aristotle plato epic socrates homer roman greeks riddle greece city antigone gods athens 80 0.07717 english british irish england london john eighteenth ireland britain victorian william century early thomas sir george late history charles 81 0.09257 colonial european western native west culture world indian cultural africa african peoples imperial discourse europe indigenous anthropology people colonialism 82 0.0366 genre bakhtin genres russian generic dialogue carnival dialogic literary mikhail literature rabelais poetics dostoevsky text speech work theory pushkin 83 0.11295 myth ritual sacred myths king mythic symbolic story magic man traditional hero tale great stories power tales gods ancient 84 0.01573 shame movement black social trip individual larkin youth political white culture lsd term anger person usage heavy cultural man 85 0.10016 poetry poem poems poet poetic poets pound language poetics verse lyric line lines olson words word prose robert form 86 0.72608 kind sense things make work part thing point find fact made world making makes call ways great called sort 87 0.76598 point terms fact question notion problem sense concept part discussion view relation relationship important nature context makes idea simply 88 0.08277 soviet party revolutionary socialist revolution socialism communist political national left union struggle europe russian fascism war central movement european 89 0.10262 global world national cultural postcolonial nation international modernity globalization nationalism states united local transnational economic culture political western capital 90 0.11165 fiction novels literary characters fictional character story reader author joyce narrator readers literature romance james genre narrative works fictions 91 0.03936 game play games players sports playing player baseball sport chess world team rules ball leisure cricket played life living 92 0.05541 diacritics trans derrida ofthe time community relation levinas ethical ethics event gift possibility jacques responsibility work logic blanchot future 93 0.0573 children child family mother parents abuse birth mothers adult adoption childhood baby families maternal father kinship reproductive motherhood home 94 0.12026 university students education academic faculty student research teaching graduate humanities higher knowledge professional school universities studies educational college english 95 0.3797 social society cultural forms individual group ways relations role practices groups community people individuals power public important process life 96 0.24791 young man room street small years day side big home food boy front days car back girl began middle 97 0.09584 heidegger nietzsche thought hegel philosophy time world thinking truth trans spirit philosophical metaphysical essence sense phenomenology existence metaphysics man 98 0.47688 back eyes time face night left hand head day man hands dark light dream words house inside world black 99 0.16072 culture cultural popular mass rock high class contemporary youth production everyday industry dominant cultures american style traditional media consumption