View: latest | papers | tags | all 
Order by: date | type | project
Format: txt | bib

Latest

blog  Unpacking OpenAlex topics classification.

Sep 2024

In this post I have taken a closer look at the classification of scientific disciplines in OpenAlex, a recently developed database of scientific works. The topics classification has been entirely generated computationally using a mix of citation clustering techniques and LLM-based labeling. The results, although not always so precise, are definitely worth exploring further.


paper  Dimensions: Calculating Disruption Indices at Scale.

Sep 2024 Quantitative Science Studies, Sep 2024. https://doi.org/10.48550/arXiv.2309.06120

Evaluating the disruptive nature of academic ideas is a new area of research evaluation that moves beyond standard citation-based metrics by taking into account the broader citation context of publications or patents. The "CD index" and a number of related indicators have been proposed in order to characterise mathematically the disruptiveness of scientific publications or patents. This research area has generated a lot of attention in recent years, yet there is no general consensus on the significance and reliability of disruption indices. More experimentation and evaluation would be desirable, however is hampered by the fact that these indicators are expensive and time-consuming to calculate, especially if done at scale on large citation networks. We present a novel method to calculate disruption indices that leverages the Dimensions cloud-based research infrastructure and reduces the computational time taken to produce such indices by an order of magnitude, as well as making available such functionalities within an online environment that requires no set-up efforts. We explain the novel algorithm and describe how its results align with preexisting implementations of disruption indicators. This method will enable researchers to develop, validate and improve mathematical disruption models more quickly and with more precision, thus contributing to the development of this new research area.


blog  Designing great dashboards: a slidedeck.

Jul 2023

What makes a dashboard great? Here is a slide deck (gslides )that consolidates several useful ideas I've ran into in the past.


blog  Notes from the book: Deep Work (2016).

Jul 2023

Finally got down to reading the book Deep Work from Cal Newport (2016).


blog  Any sufficiently advanced technology is indistinguishable from magic.

Jun 2023

Arthur C Clarke once commented that "Any sufficiently advanced technology is indistinguishable from magic" 


blog  SciGraph 2017-2023.

Feb 2023

Springer Nature retired SciGraph earlier this month. I have been the data architect and then technical lead for this project, so this is post is just a reminder of the great things we did in it. Also, a little rant about the things that weren't that great...


blog  Paperpile: a PDF manager with Google Drive backend.

Jan 2023

Paperpile is an online PDF manager that stores your personal data in your Google Drive folder.


blog  Ontospy version 2.0 released.

Oct 2022

Version 2 of the library includes SHACL support as well as various internal refactoring. Ontospy is an open source Python library and command line tool for working with vocabularies encoded in the RDF family of languages.


paper  Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery.

Sep 2022 International Conference on Science, Technology and Innovation Indicators (STI 2022), Granada, Sep 2022.

The growth of large, programatically accessible bibliometrics databases presents new opportunities for complex analyses of publication metadata. In addition to providing a wealth of information about authors and institutions, databases such as those provided by Dimensions also provide conceptual information and links to entities such as grants, funders and patents. However, data is not the only challenge in evaluating patterns in scholarly work: These large datasets can be challenging to integrate, particularly for those unfamiliar with the complex schemas necessary for accommodating such heterogeneous information, and those most comfortable with data mining may not be as experienced in data visualisation. Here, we present an open-source Python library that streamlines the process accessing and diagramming subsets of the Dimensions on Google BigQuery database and demonstrate its use on the freely available Dimensions COVID-19 dataset. We are optimistic that this tool will expand access to this valuable information by streamlining what would otherwise be multiple complex technical tasks, enabling more researchers to examine patterns in research focus and collaboration over time.


blog  Bringing quotations back to life.

Jul 2022

There's a new section on this site that allows to navigate quotations: quotes.michelepasin.org. It's just a cut-down implementation of an old idea I worked on a while ago, but you know.. sometimes it is useful to start from scratch and re-think things from the ground up.