Arc Virtual Cell Atlas: scRNA-seq¶
The Arc Virtual Cell Atlas hosts one of the biggest collections of scRNA-seq datasets.
Lamin mirrors the dataset for simplified access here: laminlabs/arc-virtual-cell-atlas.
If you use the data academically, please cite the original publications, Youngblut et al. (2025) and Zhang et al. (2025).
Connect to the source instance.
# pip install 'lamindb[jupyter,bionty,wetlab,gcp]'
!lamin connect laminlabs/arc-virtual-cell-atlas
Show code cell output
→ connected lamindb: laminlabs/arc-virtual-cell-atlas
Note
If you want to transfer artifacts or metadata into your own instance, use .using("laminlabs/arc-virtual-cell-atlas")
when accessing registries and then .save()
(Transfer data).
import lamindb as ln
import bionty as bt
import wetlab as wl
import pyarrow.compute as pc
Show code cell output
→ connected lamindb: laminlabs/arc-virtual-cell-atlas
Metadata¶
50 cell lines.
bt.CellLine.df()
Show code cell output
uid | name | ontology_id | abbr | synonyms | description | space_id | source_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
1 | 505Oto0b | NCI-H1573 | CVCL_1478 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
2 | 2yrJ1RO9 | NCI-H460 | CVCL_0459 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
3 | 729jQiCV | hTERT-HPNE | CVCL_C466 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
4 | 2MwkQgWO | SW48 | CVCL_1724 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
5 | 6SFnBlyJ | HOP62 | CVCL_1285 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
6 | 39vFskbz | NCI-H1792 | CVCL_1495 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
7 | 3Yy5mGIS | SW480 | CVCL_0546 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
8 | 1p59Uds7 | HT-29 | CVCL_0320 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
9 | 7KUHx7VC | LoVo | CVCL_0399 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
10 | 4Ch2fV9a | Hs 766T | CVCL_0334 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
11 | 65gF96lU | PANC-1 | CVCL_0480 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
12 | 6h8KcJYp | MIA PaCa-2 | CVCL_0428 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
13 | 7aEsdKjg | SW 1271 | CVCL_1716 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
14 | 4n6gUGHY | RKO | CVCL_0504 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
15 | 1v7Mehiu | H4 | CVCL_1239 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
16 | HLBTHKPg | SW1417 | CVCL_1717 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
17 | tdp1HNAN | CFPAC-1 | CVCL_1119 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
18 | 5Jp9rqX7 | SW 900 | CVCL_1731 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
19 | 5Vjc1Ubr | KATO III | CVCL_0371 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
20 | EtRJf7f9 | C-33 A | CVCL_1094 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
21 | 18kaNqu0 | SNU-1 | CVCL_0099 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
22 | 4IfJB0Y2 | J82 | CVCL_0359 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
23 | 3PDtUj4s | A-172 | CVCL_0131 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
24 | NvPXo2Hu | SHP-77 | CVCL_1693 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
25 | 5N1doHnP | SNU-423 | CVCL_0366 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
26 | 6GOSOOui | HS-578T | CVCL_0332 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
27 | 7QShig8F | A498 | CVCL_1056 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
28 | 6NWX3dtq | NCI-H2347 | CVCL_1550 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
29 | VEd9akJo | LOX-IMVI | CVCL_1381 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
30 | bC8JbRlg | NCI-H23 | CVCL_1547 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
31 | 5ewuYry0 | Panc 03.27 | CVCL_1635 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
32 | 1CdEQ5dJ | LS 180 | CVCL_0397 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
33 | 1mLuzzow | HEC-1-A | CVCL_0293 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
34 | 63ZWvcHV | HCT15 | CVCL_0292 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
35 | 5XupQdHO | COLO 205 | CVCL_0218 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
36 | 4hyU9oFu | BT-474 | CVCL_0179 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
37 | 5lqReFKR | AN3 CA | CVCL_0028 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
38 | 1lhqeW2v | RPMI-7951 | CVCL_1666 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
39 | 39rNVaPP | SK-MEL-2 | CVCL_0069 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
40 | vEfTp1Hk | A549 | CVCL_0023 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
41 | 4QH2SpWA | NCI-H2030 | CVCL_1517 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
42 | 1K1CzNSi | C32 | CVCL_1097 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
43 | 3Oz9gRsu | HepG2/C3A | CVCL_1098 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
44 | 5JZNtoDJ | AsPC-1 | CVCL_0152 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
45 | 2eQosYls | CHP-212 | CVCL_1125 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
46 | 219BOZMe | SW 1088 | CVCL_1715 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
47 | 6O2MPQMm | A-427 | CVCL_1055 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
48 | J5Ylm8TV | NCI-H2122 | CVCL_1531 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
49 | 7dL2LJjx | NCI-H661 | CVCL_1577 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
50 | 7VaGVBNB | NCI-H596 | CVCL_1571 | None | None | None | 1 | None | 3 | 2025-02-25 22:20:20.217993+00:00 | 1 | None | 1 |
380 compounds.
wl.Compound.df(limit=None)
Show code cell output
uid | name | ontology_id | chembl_id | abbr | synonyms | description | space_id | source_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
380 | JRDV3CsZ | Tolmetin | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
379 | x3BfjX6J | Peretinoin | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
378 | 18PJ8Lu8 | Niclosamide (olamine) | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
377 | 5yeFtKHy | Apalutamide | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
376 | 2AEfoFfq | Mifepristone | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5 | 4zIj247I | Filgotinib | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
4 | 3Eb9OOlc | Brimonidine | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
3 | 6sz7CwqK | Canagliflozin (hemihydrate) | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
2 | 6sTlb3V5 | Ataluren | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
1 | s0gVFSVr | Bestatin (hydrochloride) | None | None | None | None | None | 1 | None | 3 | 2025-02-25 22:48:58.568677+00:00 | 1 | None | 1 |
380 rows × 14 columns
1,138 perturbations.
wl.CompoundPerturbation.df(limit=None)
Show code cell output
uid | name | description | concentration | concentration_unit | duration | space_id | compound_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
1138 | kww9RaKf0mwA | [('Tolmetin', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 380 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
1137 | B9LsdpnTwXea | [('Peretinoin', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 379 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
1136 | Kh01GOU3CAj5 | [('Niclosamide (olamine)', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 378 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
1135 | aC855IwRNUln | [('Apalutamide', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 377 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
1134 | 331epg5UWWT3 | [('Mifepristone', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 376 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
218 | vcGZxPVAGYG1 | [('Auranofin', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 30 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
219 | eBv6ks1234Oq | [('Carbamazepine', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 31 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
220 | ADwDUaCfosFt | [('Flutamide', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 32 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
221 | UYxFfYf88QB8 | [('Zileuton', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 33 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
222 | lWQ3VNakcct2 | [('Osimertinib (mesylate)', 5.0, 'uM')] | None | 5.0 | uM | None | 1 | 34 | 3 | 2025-02-25 22:59:15.764901+00:00 | 1 | None | 1 |
1138 rows × 13 columns
17 metadata features.
ln.Feature.df()
Show code cell output
uid | name | dtype | is_type | unit | description | array_rank | array_size | array_shape | proxy_dtype | synonyms | _expect_many | _curation | space_id | type_id | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
19 | gQE1h3fIBiSf | sample | cat[wetlab.Biosample] | None | None | unique treatment identifier, distinguishes rep... | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-26 10:59:36.743558+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
18 | fLwdFKBUhBY9 | drugname_drugconc | cat[wetlab.CompoundPerturbation] | None | None | drug name, concentration and concentration unit | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 23:04:17.541812+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
17 | Q0cj2JR5Juwn | drug | cat[wetlab.Compound] | None | None | drug name, as used in the drugname_drugconc field | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 23:02:05.717794+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
16 | dQELv2sIVnJX | BARCODE | str | None | None | barcode id | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:35:15.627971+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
15 | 3X4d0QEUuprp | sublibrary | str | None | None | sublibrary id (related to library prep and seq... | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:35:14.673178+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
12 | vw5PQ3jN6vJV | BARCODE_SUB_LIB_ID | str | None | None | cell identifier | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:35:12.976029+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
11 | KPT70T8xJLIt | cell_name | cat[bionty.CellLine] | None | None | commonly-used cell line name (related to the c... | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:32:56.082195+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
10 | CF0O0e0WZxFz | G2M_score | float | None | None | inferred G2M phase score | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:31:22.708895+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
9 | bujDkB4Nd1S5 | S_score | float | None | None | inferred S phase score | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:31:22.144135+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
8 | X640W5tBUPOQ | pcnt_mito | float | None | None | percentage mitochondrial reads | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:31:21.581885+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
7 | PZDiL36nJSFv | mread_count | int | None | None | number of reads per cell | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:30:31.810331+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
6 | LHUmmYKjIGPl | tscp_count | int | None | None | number of transcripts, aka UMI count | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:30:31.236532+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
5 | IjSP1lCY3Hyw | gene_count | int | None | None | number of genes with at least one count | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:30:30.668750+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
4 | vshELphl73qp | cell_line | cat[bionty.CellLine.ontology_id] | None | None | cell line Cellosaurus identifier | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:27:22.393997+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
3 | PVpyJhciLdCQ | pass_filter | cat[ULabel[PassFilter]] | None | None | full' filters are more stringent on gene_count... | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:25:30.918235+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
2 | QboQ1Q1Yxsjn | phase | cat[ULabel[Phase]] | None | None | inferred cell cycle phase (G1, S, G2M) | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:21:56.935262+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
1 | YRSYWdIiesqL | plate | cat[ULabel[Plate]] | None | None | plate number | 0 | 0 | None | None | None | True | None | 1 | None | 3 | 2025-02-25 22:03:51.786985+00:00 | 1 | {'af': {'0': None, '1': True}} | 1 |
The Tahoe-100M collection¶
Every individual dataset in the atlas is an .h5ad
file that is registered as an artifact in LaminDB.
Let us first query for the Tahoe-100M
collection.
# get the collection: https://lamin.ai/laminlabs/arc-virtual-cell-atlas/collection/BpavRL4ntRTzWEE5
collection = ln.Collection.get(key="tahoe100")
# 14 artifacts in this collection, each correspond to a plate
collection.artifacts.df()
Show code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1369 | XVSrkq9pyF1OBLgG0000 | 2025-02-25/h5ad/plate3_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 13173722269 | Jnrt7DaSUCGn8D8LS2itaw | None | 4705402 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:20.497965+00:00 | 1 | None | 1 |
1366 | vn5cUJCHbjpPPsZx0000 | 2025-02-25/h5ad/plate14_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 22427932564 | FrnStRehP16siRGG35ou+g | None | 6518806 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:19.357999+00:00 | 1 | None | 1 |
1362 | 56uA9lPPmJ4zLUcr0000 | 2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 26536400717 | j1FXsX7hs7u+eBqnWnmNHw | None | 8044908 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:17.849980+00:00 | 1 | None | 1 |
1365 | 9L9HZ55HqUL0aqaR0000 | 2025-02-25/h5ad/plate13_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 28071589885 | RKOiaay+CHvv+Ukk/N+28A | None | 8501658 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:18.977981+00:00 | 1 | None | 1 |
1372 | aAHQ3zbD7n1asyYr0000 | 2025-02-25/h5ad/plate6_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 28934897078 | NYvQEqVClziHm0ozWhOw1w | None | 7545393 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:21.629962+00:00 | 1 | None | 1 |
1373 | DC5cacdJr1VoEXnl0000 | 2025-02-25/h5ad/plate7_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 16514746341 | NOS4MY6eYYPOnAB8ViyWYg | None | 5692117 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:22.009157+00:00 | 1 | None | 1 |
1375 | BDttiuV3Te8VB0dU0000 | 2025-02-25/h5ad/plate9_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 18791302576 | 4kHbVbmreg6akW6ZgsjxaA | None | 5866669 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:22.759201+00:00 | 1 | None | 1 |
1374 | czC19UpUEszVH2bU0000 | 2025-02-25/h5ad/plate8_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 30390935958 | ilAzEPIh4FlDeTFaJ1dILw | None | 8880979 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:22.387666+00:00 | 1 | None | 1 |
1370 | tKTeff0ugWqAm4P70000 | 2025-02-25/h5ad/plate4_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 23292672278 | BkBXznbSovNWXtzPFITPcQ | None | 7004356 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:20.879928+00:00 | 1 | None | 1 |
1371 | EZATJLC4jE7pmwo40000 | 2025-02-25/h5ad/plate5_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 19763140865 | VMBKFzOI5cj7UC1UDENP4A | None | 6419498 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:21.255154+00:00 | 1 | None | 1 |
1367 | aJIqo7bNyJAs9z0r0000 | 2025-02-25/h5ad/plate1_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 19070623904 | 9iCNcouMqfNS3HA/2GUWOA | None | 5481420 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:19.737995+00:00 | 1 | None | 1 |
1364 | S2h2rPLCaUhZAM9u0000 | 2025-02-25/h5ad/plate12_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 37495736876 | VjAkWVFGVpzAMi9Innusuw | None | 10487057 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:18.600910+00:00 | 1 | None | 1 |
1368 | ZFeVfd0ugAHeWCxm0000 | 2025-02-25/h5ad/plate2_filt_Vevo_Tahoe100M_WSe... | None | .h5ad | dataset | AnnData | 29037152127 | usxviuqGbuw0RYnECCVCWw | None | 8064658 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:20.113956+00:00 | 1 | None | 1 |
1363 | omn7JStfJMzy8m6O0000 | 2025-02-25/h5ad/plate11_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 23230802756 | N2mzoYlMLEl6PdecaYyDvw | None | 7435869 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:18.229629+00:00 | 1 | None | 1 |
# check the curated metadata of the first artifact
artifact1 = collection.artifacts.all()[0]
artifact1.describe()
Show code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
Artifact .h5ad/AnnData ├── General │ ├── .uid = '56uA9lPPmJ4zLUcr0000' │ ├── .key = '2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WServicesFrom_ParseGigalab.h5ad' │ ├── .size = 26536400717 │ ├── .hash = 'j1FXsX7hs7u+eBqnWnmNHw' │ ├── .n_observations = 8044908 │ ├── .path = gs://arc-ctc-tahoe100/2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WServicesFrom_ParseGigalab.h5ad │ ├── .created_by = sunnyosun (Sunny Sun) │ ├── .created_at = 2025-02-25 23:22:17 │ └── .transform = 'Register Tahoe-100M' ├── Dataset features/schema │ ├── var • 62710 [bionty.Gene.stable_id] │ │ TSPAN6 float │ │ TNMD float │ │ DPM1 float │ │ SCYL3 float │ │ C1orf112 float │ │ FGR float │ │ CFH float │ │ FUCA2 float │ │ GCLC float │ │ NFYA float │ │ STPG1 float │ │ NIPAL3 float │ │ LAS1L float │ │ ENPP4 float │ │ SEMA3F float │ │ CFTR float │ │ ANKIB1 float │ │ CYP51A1 float │ │ KRIT1 float │ │ RAD52 float │ └── obs • 16 [Feature] │ cell_line cat[bionty.CellLine.onto… A-172, A-427, A498, A549, AN3 CA, AsPC-1… │ cell_name cat[bionty.CellLine] A-172, A-427, A498, A549, AN3 CA, AsPC-1… │ drug cat[wetlab.Compound] 5-Azacytidine, 5-Fluorouracil, Abiratero… │ drugname_drugconc cat[wetlab.CompoundPertu… [('5-Azacytidine', 0.05, 'uM')], [('5-Fl… │ pass_filter cat[ULabel[PassFilter]] full, minimal │ phase cat[ULabel[Phase]] G1, G2M, S │ plate cat[ULabel[Plate]] plate10 │ sample cat[wetlab.Biosample] smp_2359, smp_2360, smp_2361, smp_2362, … │ gene_count int │ tscp_count int │ mread_count int │ pcnt_mito float │ S_score float │ G2M_score float │ sublibrary str │ BARCODE str └── Labels └── .projects Project Tahoe-100M .references Reference Tahoe-100M: A Giga-Scale Single-Cell Per… .organisms bionty.Organism human .cell_lines bionty.CellLine Panc 03.27, PANC-1, NCI-H460, HEC-1-A, M… .compounds wetlab.Compound Acetazolamide, Neratinib, Tazarotene, 5-… .compound_perturbations wetlab.CompoundPerturbat… [('5-Azacytidine', 0.05, 'uM')], [('Iver… .biosamples wetlab.Biosample smp_2430, smp_2365, smp_2360, smp_2369, … .ulabels ULabel tahoe-100, plate10, G1, G2M, S, full, mi…
Query artifacts of interest based on metadata¶
Let’s find which datasets contain A549 cells perturbed with Piroxicam.
drugs = wl.Compound.lookup()
cell_lines = bt.CellLine.lookup()
collection.artifacts.filter(compounds=drugs.piroxicam, cell_lines=cell_lines.a549).df()
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1362 | 56uA9lPPmJ4zLUcr0000 | 2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 26536400717 | j1FXsX7hs7u+eBqnWnmNHw | None | 8044908 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:17.849980+00:00 | 1 | None | 1 |
1363 | omn7JStfJMzy8m6O0000 | 2025-02-25/h5ad/plate11_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 23230802756 | N2mzoYlMLEl6PdecaYyDvw | None | 7435869 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:18.229629+00:00 | 1 | None | 1 |
1364 | S2h2rPLCaUhZAM9u0000 | 2025-02-25/h5ad/plate12_filt_Vevo_Tahoe100M_WS... | None | .h5ad | dataset | AnnData | 37495736876 | VjAkWVFGVpzAMi9Innusuw | None | 10487057 | md5 | False | False | 1 | 2 | 3 | None | True | 1 | 2025-02-25 23:22:18.600910+00:00 | 1 | None | 1 |
You can download an .h5ad
into your local cache:
artifact1.cache()
Or stream it:
artifact1.open()
Open the obs metadata parquet file as a PyArrow Dataset¶
Open the obs metadata file (2.29G) with PyArrow.Dataset
.
ulabels = ln.ULabel.lookup()
parquet_artifact = ln.Artifact.filter(
key__contains="obs_metadata.parquet", ulabels=ulabels.tahoe_100
).one()
parquet_artifact
Artifact(uid='y1TTR9wbrmZEwpOa0000', is_latest=True, key='2025-02-25/metadata/obs_metadata.parquet', suffix='.parquet', otype='DataFrame', size=2293981573, hash='qEWOpGw9CmQVzaElyMWT1Q', n_observations=100648790, space_id=1, storage_id=2, run_id=1, created_by_id=1, created_at=2025-02-25 19:33:42 UTC)
dataset = parquet_artifact.open()
dataset.schema
Show code cell output
! run input wasn't tracked, call `ln.track()` and re-run
plate: string
BARCODE_SUB_LIB_ID: string
sample: string
gene_count: int64
tscp_count: int64
mread_count: int64
drugname_drugconc: string
drug: string
cell_line: dictionary<values=string, indices=int32, ordered=0>
sublibrary: string
BARCODE: string
pcnt_mito: float
S_score: double
G2M_score: double
phase: dictionary<values=string, indices=int32, ordered=0>
pass_filter: dictionary<values=string, indices=int32, ordered=0>
cell_name: dictionary<values=string, indices=int32, ordered=0>
__index_level_0__: int64
-- schema metadata --
pandas: '{"index_columns": ["__index_level_0__"], "column_indexes": [{"na' + 2487
Which A549 cells are perturbed with Piroxicam.
filter_expr = (pc.field("drug") == drugs.piroxicam.name) & (
pc.field("cell_name") == cell_lines.a549.name
)
df = dataset.scanner(filter=filter_expr).to_table().to_pandas()
df.value_counts("plate")
plate
plate12 2818
plate10 2812
plate11 2279
Name: count, dtype: int64
df.head()
plate | BARCODE_SUB_LIB_ID | sample | gene_count | tscp_count | mread_count | drugname_drugconc | drug | cell_line | sublibrary | BARCODE | pcnt_mito | S_score | G2M_score | phase | pass_filter | cell_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
29314 | plate10 | 50_030_183-lib_1681 | smp_2408 | 644 | 863 | 1024 | [('Piroxicam', 0.05, 'uM')] | Piroxicam | CVCL_0023 | lib_1681 | 50_030_183 | 0.101970 | -0.282297 | -0.165568 | G1 | full | A549 |
29337 | plate10 | 50_035_135-lib_1681 | smp_2408 | 1130 | 1570 | 1827 | [('Piroxicam', 0.05, 'uM')] | Piroxicam | CVCL_0023 | lib_1681 | 50_035_135 | 0.077070 | -0.335042 | -0.280220 | G1 | full | A549 |
29338 | plate10 | 50_035_171-lib_1681 | smp_2408 | 1058 | 1534 | 1809 | [('Piroxicam', 0.05, 'uM')] | Piroxicam | CVCL_0023 | lib_1681 | 50_035_171 | 0.124511 | -0.402028 | -0.404579 | G1 | full | A549 |
29352 | plate10 | 50_038_157-lib_1681 | smp_2408 | 1265 | 1883 | 2240 | [('Piroxicam', 0.05, 'uM')] | Piroxicam | CVCL_0023 | lib_1681 | 50_038_157 | 0.147106 | -0.455343 | -0.311355 | G1 | full | A549 |
29355 | plate10 | 50_039_078-lib_1681 | smp_2408 | 1355 | 1914 | 2258 | [('Piroxicam', 0.05, 'uM')] | Piroxicam | CVCL_0023 | lib_1681 | 50_039_078 | 0.070010 | -0.349396 | 0.186264 | G2M | full | A549 |