Open Credo

Knowledge Graphs as a catalyst for change

Detecting Author Misconduct With The Power Of Knowledge

SAGE Publishing is a global academic publisher driven by the belief that social and behavioural science has the power to improve society. SAGE produces educational resources that support instructors to prepare the citizens, policy makers, educators and researchers of the future. They successfully publish more than 1,000 journals and 900 new books globally each year.

OpenCredo was approached by SAGE to help them on their journey to become more data driven by gaining insights into their creators, customers and – ultimately – their business.

Publishing companies are actively battling industrialised cheating, where some companies provide systemic production of falsified research for purchase. Authors of research papers are required to contribute to the research described in the paper, thus paying for an authorship slot is a form of misconduct.

SAGE was looking for a technology partner to develop a solution that would identify these fraudulent forms of authorships.

Why OpenCredo?

OpenCredo have deep expertise in the data space, from defining the Data Strategy to the design and implementation of end-to-end Data & ML solutions. We’ve helped various clients by setting up secure and scalable data engineering pipelines and data science development environments to experiment faster and provide better actionable insights. 

Using Graph Technologies To Uncover Hidden Links

OpenCredo worked to deliver a solution that would enable the SAGE team to work collaboratively on a secure data science platform.  Here they could explore the relationships between authors and published papers. Our team of consultants achieved this by modelling the data set and ingesting it into a Neo4j graph database to create a knowledge graph.

The GCP-based solution consisted of an automated ingestion pipeline capable of running graph based algorithms for fraud detection. The team ingested 1 TB of data and deployed it onto the fully secured infrastructure to store and query the data for further analysis. 

Over 300 million nodes and 2 billion relationships were analysed using Jupyter Notebooks and different graph traversal algorithms, leveraging both similarity and community detection.

Becoming A Data Driven Organisation

During the delivery, SAGE’s data science team were also upskilled, enabling them to continue the efforts of identifying fraudulent authors.

Through this partnership, different departments in the organisation were able to gain a deeper understanding of the business and are starting to discuss how to make data central to all their decision making. SAGE are now taking this further by building their own in-house data product teams. 

Through this partnership, SAGE Publishing acquired: 

  • A production ready POC that can detect potential authorship misconduct through Graph and ML algorithms, which SAGE is now looking to take into production.
  • Digital Transformation,  through knowledge and understanding gained across the organisation. Sage is now taking steps further to becoming a data driven organisation by building an in-house data product teams.
  • Increased data literacy, with different departments in the organisation gaining more understanding and an appetite to learn about the data they own and enrich it via external resources using graph algorithms.
  • Upskilling and knowledge transferring across their data science team, enabling them to continue the efforts in detecting authorship misconducts in-house.

“OpenCredo’s expertise & experience, combined with a genuine desire and ability to upskill and transfer knowledge, has been invaluable in helping us increase our data and cloud literacy within the organisation, as well as move forward with our strategic objectives. If you have a challenging data platform project you are embarking on, I would highly recommend working with OpenCredo as a trusted partner.” – Helen King, Head of Transformation at SAGE Publishing