The Origins of SARS-CoV-2: Part 3

Was this virus designed in lab? Accidentally released?


In Part 1 and Part 2 we dug into the science of where SARS-CoV-2 is from and what makes it different. But what does the data tell us about the alternative origin stories?

A Centers for Disease Control and Prevention (CDC) scientist demonstrates a lab technique called “candling,” which determines if an egg is suitable for use in growing viruses. Photographer James Gathany.

A Centers for Disease Control and Prevention (CDC) scientist demonstrates a lab technique called “candling,” which determines if an egg is suitable for use in growing viruses. Photographer James Gathany.


A Review of Parts 1 and 2 - Origin Story

If you haven’t read them yet, check out Part 1 and Part 2 of this story. But as a refresher, here’s a quick review:

Part 1: A story of bats and pangolins

In Part 1 we explained how scientists investigate the origins of a virus. We looked at how to determine the history of a virus and its family tree based on its genome. We also discussed the leading hypothesis for the origins of SARS-CoV-2:

  1. Most of the SARS-CoV-2 genome likely comes from a coronavirus closely related to the horseshoe bat coronavirus, Bat CoV RaTG13.

  2. The receptor binding domain (RBD) of the spike protein is closely related to an entirely different coronavirus found in Malayan pangolins.

Finally, we mentioned this new RBD region made SARS-CoV-2 far better at entering human cells than the original SARS-CoV.

Top: the intermediate horseshoe bat, courtesy of mammalwatching.comBottom: the Malayan pangolin; Firdia Lisnawati: AP Photo

Top: the intermediate horseshoe bat, courtesy of mammalwatching.com

Bottom: the Malayan pangolin; Firdia Lisnawati: AP Photo

Part 2: A new set of tools

In Part 2, we looked at what makes SARS-CoV-2 different from other coronaviruses and why it’s so good at infecting humans. We focused on three components of the SARS-CoV-2 spike protein:

  1. Its spike protein is very good at binding human ACE2 to enter our cells (discussed in Part I)

  2. The spike protein can be activated in a wide range of cells and tissues (polybasic cleavage site)

  3. The spike protein may be better at hiding from antibodies (glycan shield)

Screen Shot 2020-04-05 at 7.25.01 PM.png

The amino acid sequence for the cleavage sites of the most closely related viruses to SARS-CoV-2. Edited from Andersen et al 2020.

The Best Hypothesis

So what’s the best hypothesis? The available data suggests a two-step process that may have given rise to SARS-CoV-2:

  1. A bat coronavirus likely infected an intermediary animal (potentially a Malayan pangolin) where it recombined with a non-bat coronavirus.

  2. Over time, either in the intermediary animal or while in humans, SARS-CoV-2 developed additional mutations: a polybasic cleavage site and a nearby o-linked glycan addition site.

It’s worth noting there are actually two competing hypotheses. In hypothesis 1, SARS-CoV-2 gains its new tools before spillover into humans. In hypothesis 2, SARS-CoV-2 gains these tools while amplifying in humans.

If hypothesis 1 is correct, then somewhere in the wild, there is a reservoir of viruses that look very similar to SARS-CoV-2. Whether it is in pangolins, bats, or some other intermediate animal, there may be other SARS-like viruses that are poised to spillover into humans again. If we knew this to be true, we can take action now to develop surveillance programs and potential drugs and vaccines in preparation for such an event.

If hypothesis 2 is correct, then when SARS-CoV-2 first infected humans, it probably replicated and spread less well (like MERS, another human coronavirus). Over multiple replication cycles, the virus made mistakes and some mutants gained the advantage and began spreading efficiently among humans. If this is true, then we can implement better measures to monitor humans for new infections, and catch spillover viruses before they “get good” at spreading in humans.


Most scientists will look at the data gathered so far and say we’re on the right track because:

  1. All the current data fits our hypotheses

  2. Our hypotheses are the most parsimonious and best satisfy Occam’s Razor

  3. Our hypotheses are testable and falsifiable

  4. This strongly implies the virus was not designed or modified in a lab

However, the internet is full of theories and questions that don’t quite fit the scientific mold. And it’s important to consider alternative theories and not simply brush them all off. In addition, it’s important for the scientific community to openly revise their hypotheses when new data becomes available. Given this, we need to ask another question: Did SARS-CoV-2 originally come from the Huanan Seafood Wholesale Market?


The Huanan Seafood Wholesale Market (武漢華南海鮮批發市場)

Closure of the Huanan Seafood Wholesale Market on January 1st, 2020. Photo: AFP

Closure of the Huanan Seafood Wholesale Market on January 1st, 2020. Photo: AFP

From the early epidemiological data, it was reported that the Huanan Seafood Wholesale Market was the likely origin of the SARS-CoV-2 outbreak. The only link found between many of the early cases was the market itself and environmental samples from the market were found to contain SARS-CoV-2. It’s easy to understand why the market was implicated early on, but in science we must be willing to change our hypothesis in the face of new data.

Upon further investigation, the first case detected from December 1st was found to have no connection to the Huanan Seafood Wholesale Market. Also, that patient was never linked to any future cases of COVID19. A third of the first 41 cases had no connection to the market, including 3 of the first 4 cases reported. Given this data, we have a new hypothesis: The first human cases of SARS-CoV-2 infection must have happened before December 2019 and likely did not originate at the Huanan Seafood Wholesale Market.

How do we test this? We do exactly what we did comparing SARS-CoV-2 to other coronaviruses except this time, we compare it to itself in different patients. To do this, one group gathered many sequences of SARS-CoV-2 and compared those sequences to build a family tree:

Screen Shot 2020-04-06 at 9.17.13 PM.png

Modified from Yu et al 2020 [DOI: 10.12074/202002.00033]. Focus on group C in red. This is the group of sequences associated with the Huanan Market. D (brown) and E (yellow) are later versions of this as well. But A (green) and B (blue) are more closely related to the bat sequence than the red/yellow/brown sequences are, and they seem to be the ancestors of the market sequences. This indicates that the virus must have already spread in humans before it reached the Huanan market, and therefore the market was not the original source.

The above image has a family tree at the top and a proposed mechanism for viral divergence at the bottom. From this study, it appears the first infections were from Wuhan, but probably not at the Huanan Seafood Wholesale Market. Instead, the virus entered humans earlier (shown above as group A in green and group B in blue). Eventually, the virus appeared at the market and infected a large group of people. Two additional pieces of evidences are most noteworthy:

  1. The earliest sequenced case could not be linked to the Huanan Market

  2. Sequences from patients connected to the market could not account for the entire diversity of all SARS-CoV-2 sequences collected afterwards. In other words, careful sequence comparisons show that some of the viruses aren’t “children” of the viruses acquired at the market.

This indicates the virus had already infected a small number of people prior to its introduction to the Huanan Seafood Wholesale Market, likely dating back to November 2019. A small but important thing to note for later is that all subsequent infections across the world appear to be “the children of” groups A and B. This indicates that while the Huanan Seafood Wholesale Market is not the origin, the origin was among these first patients in China. These implications are also important when considering the role these “wet markets” may or may not play in starting and/or spreading zoonotic infections.


So was SARS-CoV-2 designed in a lab?

It’s natural to seek someone to blame. A virus is a vague entity—hard to fight and hard to understand (we’re trying, but it takes time). We seek a narrative where there is a villain, and where we can see the villain get their comeuppance, and so it is understandable that fears of SARS-CoV-2 being deliberately designed and released have gained traction.

Fortunately, we finally have the data needed to give a more confident answer to the question is it likely that SARS-CoV-2 is manmade? This is a widespread fear on social media, so let’s take a closer look.

What would it take to convince us this virus was made/designed/altered in a lab? Nature works by a process of random selection over time to develop new tool and abilities. However, human engineering of viruses mainly operates in some combination of two ways:

  1. We take pieces we know work elsewhere and swap them in, and/or

  2. We try and create artificial simulators of natural selection that train a virus to deal with new problems.

Could SARS-CoV-2 be man-made from pieces of other viruses?

Let’s address the first possibility. To reiterate, most of SARS-CoV-2 comes from a bat coronavirus closely related to RaTG13. This virus is not known to cause disease in humans. If we were virus engineers (and this actually happens to be my job in the Benhur Lee Lab) we would need to:

  1. Make a virus backbone from a never-before-seen virus that looks like, but isn’t, RaTG13 without having any reason to believe it would be a better starting place than a previously characterized virus (like the original SARS-CoV)

  2. Spend months to years building a system that is easy to engineer (reverse-genetics system) when there are other virus backbones readily available.

  3. Choose the RBD region from an unknown pangolin coronavirus even though all computer models show it should be suboptimal at binding ACE2, and show that it binds well in spite of the models (paper 1, paper 2, paper 3, paper 4)

All of these steps sound like bad ideas from a scientist’s perspective: there were easier ways to engineer a coronavirus, and no one would have rationally chosen either the bat virus backbone or the pangolin portion of the spike protein. Therefore, SARS-CoV-2 is unlikely to be man-made from pieces of other viruses—we have zero evidence that any person or lab has attempted even one part of this process.

But what if this virus was developed using simulated natural selection in a lab?

This is a good question and one we can answer in a few ways.

First, the likelihood of simulated natural selection stumbling on the near exact RBD from a previously unknown pangolin coronavirus is mathematically unlikely. Much less likely than simply stealing it from the pangolin coronavirus via recombination in nature.

Second, what about the polybasic cleavage site and the o-linked glycan? We have seen, with other viruses, the ability to develop polybasic cleavage sites when put under just the right conditions for long periods of time. While unlikely, this piece of the virus could plausibly be developed through selection in a lab setting. However, what is near impossible is the development of the o-linked glycan addition motif. This is because the pressure to develop this glycan shield requires avoiding an intact immune system. This type of selection cannot occur using cell culture, and there is no known animal model that would allow for selection of human-like ACE2 binding and avoidance of immune recognition. This strongly implies SARS-CoV-2 could not have been developed in a lab, even by a system of simulated natural selection.


But wait, there’s more…

Natural or Unnatural Selection: The Ka/Ks Ratio

The Ka/Ks ratio gives us powerful insight into the history of anything with a genome. Before using it to asses SARS-CoV-2, we first need a quick biology intro to the types of mutations a genome can experience. Here’s what you need to know:

  • Many mutations to a genome won’t actually result in a functional change (synonymous mutations)

  • Some mutations to the genome will result in a functional change (non-synonymous mutations)

  • By comparing the number of synonymous and non-synonymous mutations, we can infer the type of selective pressure that occurred

Created by Ross Firestone. Watch the next lesson: https://www.khanacademy.org/test-prep/mcat/biomolecules/genetic-mutations/v/the-causes-of-genetic-mutations...

Because synonymous mutations should have no effect, we expect them to happen at a relatively consistent rate. That makes them a good baseline that we can compare the number of non-synonymous mutations to. By calculating the ratio between these two numbers we can differentiate between three different types of selection:

  1. Purifying selection: This virus is already a great fit where it is and cannot afford to change because every change makes it worse. You should see very few non-synonymous changes here.

  2. Darwinian selection: This virus is not a good fit where it is and has to change and get better or it’s going to die out. You should see many non-synonymous changes.

  3. Neutral selection: There is no pressure on this virus either way. Non-synonymous changes and synonymous changes should come at about the same rate.

We would expect a virus that is learning to exist in a new context would be undergoing Darwinian selection and we would see a high rate of non-synonymous changes in some part of the genome. This would be the case if the virus were being designed via simulated natural selection, we would expect at least some part of the genome to show Darwinian selection.

In an analysis by Dr. Trevor Bedford using an open-source program (that you can try at home), he began with the sequences of all viruses related to SARS-CoV-2. He next calculated the Ka/Ks ratios when comparing SARS-CoV-2 to related viruses. He also calculated the Ka/Ks ratio for SARS-CoV-2 to a hypothetical ancestor virus predicted by his program. In his analysis, Dr. Bedford found that 14.3% of the mutations between SARS-CoV-2 and its predicted ancestor resulted in non-synonymous mutations. RaTG13, a natural coronavirus has 14.2% of its mutations as non-synonymous. Both of these numbers indicate a purifying selection, with very few non-synonymous changes. This holds true across the entire genome with no part of it showing Darwinian selection. This is a very strong indicator that SARS-CoV-2 was not designed using forced selection in a lab.


The Wuhan Institute of Virology

Conspiracy theories about SARS-CoV-2 originating from a lab have been reported in the Daily Mail, the Washington Times, outlets in India, and on Fox News. They tend to implicate the Wuhan Institute of Virology.

Wuhan_Institute_of_Virology_logo.png

In addition to our own debunking of the bioweapon conspiracies, many experts agree the Wuhan Institute of Virology would be an unlikely location for that kind of research. Not only is there no evidence of bioweapons work in Wuhan, it’s also an unlikely location for secret bioweapons research due to its international openness (relative to other Chinese labs). French engineers helped to design the Institute and it also has connections to the Galveston National Laboratory at the University of Texas Medical Branch.

Map courtesy of The Daily Mail

Map courtesy of The Daily Mail

Other conspiracy-driven articles have cited an unreviewed preprint where the authors claim that hints of HIV sequences are hidden in the SARS-CoV-2 genome. Importantly, this preprint was voluntarily withdrawn by the authors after they revisited their own data. They thought they had found a sequence of RNA in SARS-CoV-2 that was missing in closely related viruses—they were wrong. The authors then looked at that sequence and saw that it looked a little similar to HIV and rushed to print their results. Importantly, the sequence looked equally like a piece of DNA from multiple plants, bacteria, and even other viruses, but in their rush to print, they overlooked all of these issues. This goes to show, while pre-prints can be important ways to get data out early, it’s important to take great care in double checking your own work even before the stage of peer review.

The, now withdrawn, pre-print even temporarily concerned NYT Op-Ed writer Ross Douthat. It shows the importance of waiting for peer review before jumping to conclusions.

The, now withdrawn, pre-print even temporarily concerned NYT Op-Ed writer Ross Douthat. It shows the importance of waiting for peer review before jumping to conclusions.

Most stories about SARS-CoV-2 being intentionally designed or a bioweapon have been retracted or have had disclaimers added to them like this one from a Jan. 25 article in the Washington Times:

“Editor’s note (March 25, 2020): Since this story ran, scientists outside of China have had a chance to study the SARS-CoV-2 virus. They concluded it does not show signs of having been manufactured or purposefully manipulated in a lab, though the exact origin remains murky and experts debate whether it may have leaked from a Chinese lab that was studying it.”

Thankfully, there has been increasing pushback against the bioweapon conspiracy theory. It’s become clear that having a very transmissible, difficult to control virus, with unpredictable lethality would make for a very ineffective bioweapon. Even so, conspiracy theories and rumors persist.


What About a Leak?

In other news, Iran’s Supreme Leader, Ayatollah Khamenei, blamed the virus on the U.S. while Chinese Foreign Ministry Spokesperson Zhao Lijian also argued the U.S. may have leaked the virus. Chinese government news amplified these rumors while keeping the precise accusations vague, even including Italy as a possible origin. While none of these theories have any supporting evidence, they often use a common set of facts out of context. Here’s a couple to help you out:

  • Papers are cited on Twitter implying they may have worked with SARS-CoV-2 or something like it.

    • There has never been a paper published with a virus that could have been the true precursor to SARS-CoV-2.

    • The fact that these other papers are published goes to show that had a group found a similar virus, they would have been very likely to publish it.

  • Videos and articles sometimes argue that military labs like Fort Detrick are the true source of the virus and that it made it to Wuhan via the Wuhan Military Games from U.S. soldiers.

    • If this were true, we would see a set of SARS-CoV-2 genomes in the U.S. unrelated to all others, which, as described above, we do not.

But even politicians in the United States have begun to take the bait on the spread of disinformation. Senator Tom Cotton (R, AK) has doubled down on the theory of an accidental leak from the Wuhan Institute of Virology, for example, but this is based on mistrust of China rather than any evidence: “[w]e don’t have evidence that this disease originated there,” Senator Cotton said, “but because of China’s duplicity and dishonesty from the beginning, we need to at least ask the question to see what the evidence says, and China right now is not giving evidence on that question at all.” The idea of a lab leak has been a popular theory and even received a Washington Post Op-Ed. The rumor spreads even though its strongest public supporters like Senator Cotton have since claimed that a natural origin of SARS-CoV-2 is “still the most likely.” But the damage comes in amplifying dangerous theories even when there is zero supporting evidence, making it all the harder for scientists like us to debunk them when they are unlikely to be true.


The Final Word

Hopefully the evidence presented in our articles have convinced you that SARS-CoV-2 is not a purposefully manipulated virus or released from a secret lab. Moreover, we presented a viable hypothesis for its development in the wild that fits all available evidence. However, until we find a smoking gun in the wild, we can neither prove nor disprove the origins of this virus. That’s how science works.

Science operates on data and what we can say with confidence. Future hypotheses worth testing should:

  1. Fit the data gathered so far

  2. Be testable and falsifiable

  3. Maximize parsimony and best satisfy the concept of Occam’s Razor

In summary, the current data for SARS-CoV-2 indicates this virus evolved naturally and allows us to say that any virus origins scenario involving a laboratory is deeply unlikely.


Post Script: The Role of Humans

While this disease was not designed in a lab, this does not take humans off the hook for this pandemic. It is not intermediate horseshoe bats and Malayan pangolins that shoulder the responsibility for pandemics. The rise of zoonotic emerging diseases is due to humans encroaching on animals, not the other way around. Risks to people decrease dramatically by protecting wildlife from trafficking and limiting encroachment into wild habitats. Putting our resources towards surveillance and working to better understand questions such as ‘why are bats less affected by many viruses’ could provide us the tools we need to fight future outbreaks.

We may find that turning a critical eye towards our own interactions with the environment is a more useful conversation to have about the origins of SARS-CoV-2 instead of amplifying conspiracy theories about shadowy actors. We may even find policy solutions that reduce the chances of a future pandemic by learning lessons from this one.


Thank you for reading all three parts on the Origins of SARS-CoV-2. Don’t hesitate to email us additional questions you may have about SARS-CoV-2 and COVID19 and stay tuned for even more articles!


Screen Shot 2020-03-23 at 5.55.09 PM.png

Christian Stevens

Christian is an MD/PhD Student at the Mount Sinai School of Medicine who got his BS from Harvey Mudd College.

He joined the Benhur Lee Lab in 2018 and has since worked on two main projects. The first uses viral engineering to explore the use of Sendai Virus as a viral vector to deliver gene editing tools. The second has involved computational work building pipelines for analysis using both Illumina sequencing and Oxford Nanopore direct-RNA sequencing. Christian’s main interests have been in directing world class clinical research towards the most marginalized patients, especially in the fields of infectious disease and virology.

christian.stevens@icahn.mssm.edu

@csstevens91

Previous
Previous

Breaking down the biology behind antibodies and COVID-19

Next
Next

Where is the SARS-CoV-2 vaccine, anyways? Part 1