Uorain

Neural model of the University of Bristol researchers and their work

Kacper Sokol

University of Bristol modelled as a brain

  • Brain — University of Bristol
  • Lobes — Faculties
  • Lobules — Schools
  • Neurons — Researchers
UoB
Biomedical
Sciences
Science
Engineering
Arts
Cellular & Molecular Medicine
Biochemistry
Mathematics
Physics
Chemistry
Merchant Venturers'
Queen's
Humanities
Arts

Researcher ID: 123456

Collaboration = Excitation

Each researcher is modelled as an excitatory neuron

Since we have the year of each publication we model the learning rate accordingly

i.e. the more recent a publication is the more influential it is on strength of connection between researchers

(Hover the mouse over each point to see the learning rate value for each year)

The data

  • Raw data
  • Preprocessing
  • Outputs
  • Data statistics

Raw data

Inputs

  • staff.csv
    • person ID
    • published name
    • association (organisational code)
    • job title
  • authors.csv
    • person ID
    • publication ID
  • outputs.csv
    • publication ID
    • title
    • type of publication
    • publication year

Information

Basic statistics about each researcher.

University structure.

Identify how many times each pair of researchers collaborated in any particular year based on how many papers they have published…

…identify the strength of internal and external connections between academic units of particular type (e.g. schools or faculties).

Model similarities between papers (similar titles), hence identify potential collaborations.

Preprocessing

  • Publications:
    • Reduce the amount of data (~30k to ~8k of papers) — since we are only interested in modelling the collaborations we can remove all single authored papers.
    • Assign all the authors to the corresponding publications.
    • Clean, parse and tokenise free text fields (titles, abstracts, keywords) — remove unicode characters, remove stop words, get stems of all the words…
    • … but unfortunately only the publication title is useful as keywords are missing in 20% and abstract in 50% of the dataset entries.

  • Authors:
    • Create structured object from the type of publication.
    • create structured object from the job titles.

Outputs


Publications

id title type year keywords abstract authors
53490655 communication as information use (book, chapter) 2011 NaN introduction uncertainty is an unavoidable pro... [1968, 12503]
56453104 tobacco (book, chapter) 2013 NaN NaN [27487, 22878]

Authors

id forename surname published name job title organisation code
10925 Jamie Jeremy [JY Jeremy] [(, Emeritus, , , , , , Professor, )] [SOCS]
24576 Siobhan Shilton [SM Shilton, Siobhan M Shilton] [(, , , , , , , Reader, ), (, , , Research, , ... [FREN]

Statistics

Published name

  • People use multiple names for their publications
  • There are 2 people who are using 11 different names, e.g.:
    KME Turner Katy M E Turner Katy Turner K Turner
    Katherine Turner Katy M. E. Turner K.M.E. Turner K M E Turner
    K.M. Turner Katherine M E Turner Katherine M. E. Turner

y-axis — number of different names used                       x-axis — number of people using y different names

Statistics

Job titles

There are around 1426 different job titles among 3263 researchers.

Some researchers have multiple job titles in the PURE record (pie-chart).

Queries such as "how many professors per department/faculty" are possible.


The most interesting job titles of people who published:

  • Receptionist
  • Bioinformatician
  • Geneticist

Number of job titles

Statistics

Job titles

I reorganised the job titles into 9-tier hierarchy (the most popular title):

Visiting (103) | Postdoctoral (46) | Doctoral (5) | Faculty (3) | Undergraduate (1)
Senior (903) | Emeritus (136) | Assistant (8)
Clinical (158) | Data (8) | IT (7) | Technical (6) | Dental (5) | Trial (4) | Systems (4) | Centre (3) | Building (2) | Molecular (2) | Library (2) | General (2) | Team (1) | Programme (1) | HR (1) | Nursery (1) | User (1)
Research (1281) | Teaching (137) | Scientific (4) | Training (3) | Education (2) | Experimental (1)
and Teaching (2) | and Training (1)
Project (42) | Laboratory (15) | Computer (2) | Study (1) | Switchboard (1)
Associate (717)
Lecturer (665) | Fellow (625) | Professor (607) | Reader (210) | Collaborator (128) | Technician (60) | Teacher (57) | Staff (48) | Manager (42) | Consultant (31) | Chair (29) | Support (20) | Tutor (19) | Administrator (16) | Officer (16) | Director (16) | Head (15) | Demonstrator (12) | Dean (7) | Co-ordinator (7) | Assistant (6) | Council (6) | Vice-Chancellor (4) | Leader (3) | Receptionist (3) | Coordinator (3) | Supervisor (3) | Librarian (2) | Instructor (2) | Student (2) | Developer (2) | Accountant (1) | Adviser (1) | Bioinformatician (1) | Worker (1) | Geneticist (1) | Selector (1) | President (1) | Secretary (1) | Trainer (1) | Surveyor (1) | Operator (1)
(Research) (1)

Statistics

Multiple associations

Some people are also associated with more than one academic unit…

University structure

Radial tree

The university is structured hierarchically, therefore it can be modelled as a tree. Due to its size linear trees are not comprehensible but radial trees are.

The structure is encoded as a nested JSON file:

          {
            "name": "...",
            "short_name": "...",
            "full_nume": "...",
            "url": "...",
            "type": "...",
            "people": ["...", "..."],
            "children": [{...}, {...}]
          }
        

University of Bristol

Academic units at the University of Bristol

Publications

descriptive modeling

Since we model interactions between researchers we consider papers published by more than one author.

We take the year of publication into account — the more recent the publication is the more it contributes to connection between academic units.

We present results for each year separately and through the whole period 2008—2013.

We model the interactions on both school and faculty level.

We present two results for faculty interactions:
  • raw — concatenate scores for academic units in a faculty.
  • normalisation — as above but account for the number of academic units per faculty.
Faculty connections
year Internal connection
2008 FSCI FMDY FMVS
2009 FMDY FSCI FSSL
2010 FMDY FSCI FSSL
2011 FMDY FSCI FSSL
2012 FMDY FSCI FMVS
2013 FSCI FMDY FSSL
total FMDY FSCI FSSL
REST FENG FMDY FSCI FOAT INST FMVS FSSL
Total external connection 1st FENG FENG FMDY FSCI FOAT FMDY FMVS FSSL
2nd FSCI FOAT FMVS FOAT FSCI FSSL FMDY FMDY
3rd FSSL FMVS FSCI FMDY FENG FMVS FENG FSCI
Faculty connections (normalised)
year Internal connection
2008 FMVS FSSL FMDY
2009 FMDY FSSL FMVS
2010 FMDY FSSL FENG
2011 FMDY FSSL FMVS
2012 FMVS FMDY FENG
2013 FSSL FENG FMDY
total FMDY FSSL FMVS
REST FENG FMDY FSCI FOAT INST FMVS FSSL
Total
external
connection
1st FENG FENG FMDY FSCI FOAT FMDY FMVS FSSL
2nd REST REST FMVS REST FENG FSSL FMDY REST
3rd FSCI FOAT INST FOAT FSCI FMVS FENG FMDY
Schools connections
year Internal connection
2008 MVFS (0.04) CHSE (0.03) LAWD (0.02)
2009 CHSE (0.06) PSYC (0.03) MODL (0.03)
2010 CHSE (0.08) PSYC (0.03) SPOL (0.03)
2011 CHSE (0.05) MVSF (0.04) PSYC (0.03)
2012 MVSF (0.08) CHSE (0.05) PSYC (0.03)
2013 CHSE (0.05) MODL (0.04) PSYC (0.04)
total CHSE (0.31) PSYC (0.20) MVSF (0.15)
INOV CABI MVSF ENGF VESC PSYC CHEM LANG BIOC EDUC PHPH LAWD HUMS
Total
external
connection
1st MVEN SSCM BIOC MVEN CHSE INOV SCIF SOCS MVSF ENGF MVSF SPOL SART
2nd PSYC PHPH MSAD EDUC SSCM REST PHPH VESC BIOC EFIM SPAI
3rd QUEN PANM QUEN BISC BISC MODL PANM MEED SOCS ORDS MEED
Network analysis

Network analysis is also possible but the network itself is difficult to visualise because of small differences in the connection strength.

The importance of each node in the network can be expressed by its centrality:

  • Closeness — sum of the shortest paths between the node and all the other nodes, which in schools connections plot shows as the most connections to other academic units;
  • Betweenness — is based on the number of shortest paths between each pair of vertices passing through the vertex, which in schools connections plot shows again as the most connections to other academic units;
  • Eigenvector — is based on the number of connections to important vertices, which in schools connections plot shows as strong connection between ORDS and MDYF and then any node strongly connected to either one of them

Centrality:

Closeness SOCS (0.80) SSCM (0.76) CHEM (0.70) QUEN (0.69) MODL (0.41) GSEN (0.41) NSQI (0.39)
Betweenness SSCM (0.14) SOCS (0.13) CHEM (0.12) QUEN (0.09) LANG (0.00) CABI (0.00) INOV (0.00)
Eigenvector ORDS (0.68) MDYF (0.59) CHSE (0.40) SSCM (0.08) GSEN (0.00) LANG (0.00) MODL (0.00)
Hierarchical clustering

Classical clustering gives meaningless results on this data.

On the other hand, hierarchical clustering produces somehow meaningful results
(the data are thresholded by the school connection strength).

Depending how you cut this tree structure vertically you can receive different clusterings.

Publications

predictive modelling

To discover possible collaborations between schools we model similarities between publication titles.

We do this with tf-idf and cosine distance therefore we can find similar papers and get their similarity score.

We threshold these (0.25) to get only relevant papers and then build connections between academic units based on authors associations.

Possible collaborations
plot

Possible extensions

  • With more data our association matrix would better model interactions between individuals and academic units (better clustering).
  • Use keywords and abstracts to model similarities between publications, hence discover potential collaborations.
  • Improve visualisations and make them more interactive.

Uorain

Kacper Sokol
(Acknowledgments: Miquel Perello Nieto, Peter Flach)