Introducing Glow: an open-source toolkit for large-scale genomic analysis

Posted Leave a commentPosted in Announcements, Apache Spark, bioinformatics, Company Blog, Data and ML Research, Data Engineering, genetics, Genomics, Glow, Open Source, Regeneron

The key to solving some of today’s most challenging medical problems lies in the analysis of genomics data. Understanding the impact of the minor changes in an individual’s genome on their overall health is fundamentally a data driven challenge that requires integration across hundreds of thousands of individuals. By analyzing genomes across large cohorts, researchers […]

Parallelizing SAIGE Across Hundreds of Cores

Posted Leave a commentPosted in Data and ML Industry Use Case, Data and ML Research, Data Science and Machine Learning, Delta Lake, Ecosystem, Engineering Blog, Genomics, GWAS, healthcare, Platform, VCF

As population genetics datasets grow exponentially, it is becoming impractical to work with genetic data without leveraging Apache Spark™. There are many ways to use Spark to derive novel insights into the role of genetic variation on disease processes. For example, Regeneron works directly on Spark SQL DataFrames, and the open-source Hail package can be […]

A Guide to Training Sessions at Spark + AI Summit, Europe

Posted Leave a commentPosted in Apache Spark, Company Blog, Data and ML Industry Use Case, Data and ML Research, Data Engineering, Data Science, Data Science and Machine Learning, data-science, Delta Lake, Education, Events, Keras, MLflow, Productionizing Machine Learning, PyTorch, Spark + AI Summit, Spark SQL, Structured Streaming, TensorFlow, training

Education and the pursuit of knowledge are lifelong journeys: they never complete; there is always something new to learn; a new professional certification to add to your credit; a knowledge gap to fill. Training at Spark + AI Summit, Europe is not only about becoming an Apache Spark expert. Nor is it only about being […]