hub

Arc Virtual Cell Atlas: scRNA-seq

The Arc Virtual Cell Atlas hosts one of the biggest collections of scRNA-seq datasets.

Lamin mirrors the dataset for simplified access here: laminlabs/arc-virtual-cell-atlas.

If you use the data academically, please cite the original publications, Youngblut et al. (2025) and Zhang et al. (2025).

Connect to the source instance.

# pip install 'lamindb[jupyter,bionty,wetlab,gcp]'
!lamin connect laminlabs/arc-virtual-cell-atlas
Hide code cell output
 connected lamindb: laminlabs/arc-virtual-cell-atlas

Note

If you want to transfer artifacts or metadata into your own instance, use .using("laminlabs/arc-virtual-cell-atlas") when accessing registries and then .save() (Transfer data).

import lamindb as ln
import bionty as bt
import wetlab as wl
import pyarrow.compute as pc
Hide code cell output
 connected lamindb: laminlabs/arc-virtual-cell-atlas

Metadata

50 cell lines.

bt.CellLine.df()
Hide code cell output
uid name ontology_id abbr synonyms description space_id source_id run_id created_at created_by_id _aux _branch_code
id
1 505Oto0b NCI-H1573 CVCL_1478 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
2 2yrJ1RO9 NCI-H460 CVCL_0459 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
3 729jQiCV hTERT-HPNE CVCL_C466 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
4 2MwkQgWO SW48 CVCL_1724 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
5 6SFnBlyJ HOP62 CVCL_1285 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
6 39vFskbz NCI-H1792 CVCL_1495 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
7 3Yy5mGIS SW480 CVCL_0546 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
8 1p59Uds7 HT-29 CVCL_0320 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
9 7KUHx7VC LoVo CVCL_0399 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
10 4Ch2fV9a Hs 766T CVCL_0334 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
11 65gF96lU PANC-1 CVCL_0480 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
12 6h8KcJYp MIA PaCa-2 CVCL_0428 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
13 7aEsdKjg SW 1271 CVCL_1716 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
14 4n6gUGHY RKO CVCL_0504 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
15 1v7Mehiu H4 CVCL_1239 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
16 HLBTHKPg SW1417 CVCL_1717 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
17 tdp1HNAN CFPAC-1 CVCL_1119 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
18 5Jp9rqX7 SW 900 CVCL_1731 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
19 5Vjc1Ubr KATO III CVCL_0371 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
20 EtRJf7f9 C-33 A CVCL_1094 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
21 18kaNqu0 SNU-1 CVCL_0099 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
22 4IfJB0Y2 J82 CVCL_0359 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
23 3PDtUj4s A-172 CVCL_0131 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
24 NvPXo2Hu SHP-77 CVCL_1693 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
25 5N1doHnP SNU-423 CVCL_0366 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
26 6GOSOOui HS-578T CVCL_0332 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
27 7QShig8F A498 CVCL_1056 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
28 6NWX3dtq NCI-H2347 CVCL_1550 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
29 VEd9akJo LOX-IMVI CVCL_1381 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
30 bC8JbRlg NCI-H23 CVCL_1547 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
31 5ewuYry0 Panc 03.27 CVCL_1635 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
32 1CdEQ5dJ LS 180 CVCL_0397 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
33 1mLuzzow HEC-1-A CVCL_0293 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
34 63ZWvcHV HCT15 CVCL_0292 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
35 5XupQdHO COLO 205 CVCL_0218 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
36 4hyU9oFu BT-474 CVCL_0179 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
37 5lqReFKR AN3 CA CVCL_0028 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
38 1lhqeW2v RPMI-7951 CVCL_1666 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
39 39rNVaPP SK-MEL-2 CVCL_0069 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
40 vEfTp1Hk A549 CVCL_0023 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
41 4QH2SpWA NCI-H2030 CVCL_1517 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
42 1K1CzNSi C32 CVCL_1097 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
43 3Oz9gRsu HepG2/C3A CVCL_1098 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
44 5JZNtoDJ AsPC-1 CVCL_0152 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
45 2eQosYls CHP-212 CVCL_1125 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
46 219BOZMe SW 1088 CVCL_1715 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
47 6O2MPQMm A-427 CVCL_1055 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
48 J5Ylm8TV NCI-H2122 CVCL_1531 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
49 7dL2LJjx NCI-H661 CVCL_1577 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1
50 7VaGVBNB NCI-H596 CVCL_1571 None None None 1 None 3 2025-02-25 22:20:20.217993+00:00 1 None 1

380 compounds.

wl.Compound.df(limit=None)
Hide code cell output
uid name ontology_id chembl_id abbr synonyms description space_id source_id run_id created_at created_by_id _aux _branch_code
id
380 JRDV3CsZ Tolmetin None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
379 x3BfjX6J Peretinoin None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
378 18PJ8Lu8 Niclosamide (olamine) None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
377 5yeFtKHy Apalutamide None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
376 2AEfoFfq Mifepristone None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5 4zIj247I Filgotinib None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
4 3Eb9OOlc Brimonidine None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
3 6sz7CwqK Canagliflozin (hemihydrate) None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
2 6sTlb3V5 Ataluren None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1
1 s0gVFSVr Bestatin (hydrochloride) None None None None None 1 None 3 2025-02-25 22:48:58.568677+00:00 1 None 1

380 rows × 14 columns

1,138 perturbations.

wl.CompoundPerturbation.df(limit=None)
Hide code cell output
uid name description concentration concentration_unit duration space_id compound_id run_id created_at created_by_id _aux _branch_code
id
1138 kww9RaKf0mwA [('Tolmetin', 5.0, 'uM')] None 5.0 uM None 1 380 3 2025-02-25 22:59:15.764901+00:00 1 None 1
1137 B9LsdpnTwXea [('Peretinoin', 5.0, 'uM')] None 5.0 uM None 1 379 3 2025-02-25 22:59:15.764901+00:00 1 None 1
1136 Kh01GOU3CAj5 [('Niclosamide (olamine)', 5.0, 'uM')] None 5.0 uM None 1 378 3 2025-02-25 22:59:15.764901+00:00 1 None 1
1135 aC855IwRNUln [('Apalutamide', 5.0, 'uM')] None 5.0 uM None 1 377 3 2025-02-25 22:59:15.764901+00:00 1 None 1
1134 331epg5UWWT3 [('Mifepristone', 5.0, 'uM')] None 5.0 uM None 1 376 3 2025-02-25 22:59:15.764901+00:00 1 None 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
218 vcGZxPVAGYG1 [('Auranofin', 5.0, 'uM')] None 5.0 uM None 1 30 3 2025-02-25 22:59:15.764901+00:00 1 None 1
219 eBv6ks1234Oq [('Carbamazepine', 5.0, 'uM')] None 5.0 uM None 1 31 3 2025-02-25 22:59:15.764901+00:00 1 None 1
220 ADwDUaCfosFt [('Flutamide', 5.0, 'uM')] None 5.0 uM None 1 32 3 2025-02-25 22:59:15.764901+00:00 1 None 1
221 UYxFfYf88QB8 [('Zileuton', 5.0, 'uM')] None 5.0 uM None 1 33 3 2025-02-25 22:59:15.764901+00:00 1 None 1
222 lWQ3VNakcct2 [('Osimertinib (mesylate)', 5.0, 'uM')] None 5.0 uM None 1 34 3 2025-02-25 22:59:15.764901+00:00 1 None 1

1138 rows × 13 columns

17 metadata features.

ln.Feature.df()
Hide code cell output
uid name dtype is_type unit description array_rank array_size array_shape proxy_dtype synonyms _expect_many _curation space_id type_id run_id created_at created_by_id _aux _branch_code
id
19 gQE1h3fIBiSf sample cat[wetlab.Biosample] None None unique treatment identifier, distinguishes rep... 0 0 None None None True None 1 None 3 2025-02-26 10:59:36.743558+00:00 1 {'af': {'0': None, '1': True}} 1
18 fLwdFKBUhBY9 drugname_drugconc cat[wetlab.CompoundPerturbation] None None drug name, concentration and concentration unit 0 0 None None None True None 1 None 3 2025-02-25 23:04:17.541812+00:00 1 {'af': {'0': None, '1': True}} 1
17 Q0cj2JR5Juwn drug cat[wetlab.Compound] None None drug name, as used in the drugname_drugconc field 0 0 None None None True None 1 None 3 2025-02-25 23:02:05.717794+00:00 1 {'af': {'0': None, '1': True}} 1
16 dQELv2sIVnJX BARCODE str None None barcode id 0 0 None None None True None 1 None 3 2025-02-25 22:35:15.627971+00:00 1 {'af': {'0': None, '1': True}} 1
15 3X4d0QEUuprp sublibrary str None None sublibrary id (related to library prep and seq... 0 0 None None None True None 1 None 3 2025-02-25 22:35:14.673178+00:00 1 {'af': {'0': None, '1': True}} 1
12 vw5PQ3jN6vJV BARCODE_SUB_LIB_ID str None None cell identifier 0 0 None None None True None 1 None 3 2025-02-25 22:35:12.976029+00:00 1 {'af': {'0': None, '1': True}} 1
11 KPT70T8xJLIt cell_name cat[bionty.CellLine] None None commonly-used cell line name (related to the c... 0 0 None None None True None 1 None 3 2025-02-25 22:32:56.082195+00:00 1 {'af': {'0': None, '1': True}} 1
10 CF0O0e0WZxFz G2M_score float None None inferred G2M phase score 0 0 None None None True None 1 None 3 2025-02-25 22:31:22.708895+00:00 1 {'af': {'0': None, '1': True}} 1
9 bujDkB4Nd1S5 S_score float None None inferred S phase score 0 0 None None None True None 1 None 3 2025-02-25 22:31:22.144135+00:00 1 {'af': {'0': None, '1': True}} 1
8 X640W5tBUPOQ pcnt_mito float None None percentage mitochondrial reads 0 0 None None None True None 1 None 3 2025-02-25 22:31:21.581885+00:00 1 {'af': {'0': None, '1': True}} 1
7 PZDiL36nJSFv mread_count int None None number of reads per cell 0 0 None None None True None 1 None 3 2025-02-25 22:30:31.810331+00:00 1 {'af': {'0': None, '1': True}} 1
6 LHUmmYKjIGPl tscp_count int None None number of transcripts, aka UMI count 0 0 None None None True None 1 None 3 2025-02-25 22:30:31.236532+00:00 1 {'af': {'0': None, '1': True}} 1
5 IjSP1lCY3Hyw gene_count int None None number of genes with at least one count 0 0 None None None True None 1 None 3 2025-02-25 22:30:30.668750+00:00 1 {'af': {'0': None, '1': True}} 1
4 vshELphl73qp cell_line cat[bionty.CellLine.ontology_id] None None cell line Cellosaurus identifier 0 0 None None None True None 1 None 3 2025-02-25 22:27:22.393997+00:00 1 {'af': {'0': None, '1': True}} 1
3 PVpyJhciLdCQ pass_filter cat[ULabel[PassFilter]] None None full' filters are more stringent on gene_count... 0 0 None None None True None 1 None 3 2025-02-25 22:25:30.918235+00:00 1 {'af': {'0': None, '1': True}} 1
2 QboQ1Q1Yxsjn phase cat[ULabel[Phase]] None None inferred cell cycle phase (G1, S, G2M) 0 0 None None None True None 1 None 3 2025-02-25 22:21:56.935262+00:00 1 {'af': {'0': None, '1': True}} 1
1 YRSYWdIiesqL plate cat[ULabel[Plate]] None None plate number 0 0 None None None True None 1 None 3 2025-02-25 22:03:51.786985+00:00 1 {'af': {'0': None, '1': True}} 1

The Tahoe-100M collection

Every individual dataset in the atlas is an .h5ad file that is registered as an artifact in LaminDB.

Let us first query for the Tahoe-100M collection.

# get the collection: https://lamin.ai/laminlabs/arc-virtual-cell-atlas/collection/BpavRL4ntRTzWEE5
collection = ln.Collection.get(key="tahoe100")
# 14 artifacts in this collection, each correspond to a plate
collection.artifacts.df()
Hide code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1369 XVSrkq9pyF1OBLgG0000 2025-02-25/h5ad/plate3_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 13173722269 Jnrt7DaSUCGn8D8LS2itaw None 4705402 md5 False False 1 2 3 None True 1 2025-02-25 23:22:20.497965+00:00 1 None 1
1366 vn5cUJCHbjpPPsZx0000 2025-02-25/h5ad/plate14_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 22427932564 FrnStRehP16siRGG35ou+g None 6518806 md5 False False 1 2 3 None True 1 2025-02-25 23:22:19.357999+00:00 1 None 1
1362 56uA9lPPmJ4zLUcr0000 2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 26536400717 j1FXsX7hs7u+eBqnWnmNHw None 8044908 md5 False False 1 2 3 None True 1 2025-02-25 23:22:17.849980+00:00 1 None 1
1365 9L9HZ55HqUL0aqaR0000 2025-02-25/h5ad/plate13_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 28071589885 RKOiaay+CHvv+Ukk/N+28A None 8501658 md5 False False 1 2 3 None True 1 2025-02-25 23:22:18.977981+00:00 1 None 1
1372 aAHQ3zbD7n1asyYr0000 2025-02-25/h5ad/plate6_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 28934897078 NYvQEqVClziHm0ozWhOw1w None 7545393 md5 False False 1 2 3 None True 1 2025-02-25 23:22:21.629962+00:00 1 None 1
1373 DC5cacdJr1VoEXnl0000 2025-02-25/h5ad/plate7_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 16514746341 NOS4MY6eYYPOnAB8ViyWYg None 5692117 md5 False False 1 2 3 None True 1 2025-02-25 23:22:22.009157+00:00 1 None 1
1375 BDttiuV3Te8VB0dU0000 2025-02-25/h5ad/plate9_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 18791302576 4kHbVbmreg6akW6ZgsjxaA None 5866669 md5 False False 1 2 3 None True 1 2025-02-25 23:22:22.759201+00:00 1 None 1
1374 czC19UpUEszVH2bU0000 2025-02-25/h5ad/plate8_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 30390935958 ilAzEPIh4FlDeTFaJ1dILw None 8880979 md5 False False 1 2 3 None True 1 2025-02-25 23:22:22.387666+00:00 1 None 1
1370 tKTeff0ugWqAm4P70000 2025-02-25/h5ad/plate4_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 23292672278 BkBXznbSovNWXtzPFITPcQ None 7004356 md5 False False 1 2 3 None True 1 2025-02-25 23:22:20.879928+00:00 1 None 1
1371 EZATJLC4jE7pmwo40000 2025-02-25/h5ad/plate5_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 19763140865 VMBKFzOI5cj7UC1UDENP4A None 6419498 md5 False False 1 2 3 None True 1 2025-02-25 23:22:21.255154+00:00 1 None 1
1367 aJIqo7bNyJAs9z0r0000 2025-02-25/h5ad/plate1_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 19070623904 9iCNcouMqfNS3HA/2GUWOA None 5481420 md5 False False 1 2 3 None True 1 2025-02-25 23:22:19.737995+00:00 1 None 1
1364 S2h2rPLCaUhZAM9u0000 2025-02-25/h5ad/plate12_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 37495736876 VjAkWVFGVpzAMi9Innusuw None 10487057 md5 False False 1 2 3 None True 1 2025-02-25 23:22:18.600910+00:00 1 None 1
1368 ZFeVfd0ugAHeWCxm0000 2025-02-25/h5ad/plate2_filt_Vevo_Tahoe100M_WSe... None .h5ad dataset AnnData 29037152127 usxviuqGbuw0RYnECCVCWw None 8064658 md5 False False 1 2 3 None True 1 2025-02-25 23:22:20.113956+00:00 1 None 1
1363 omn7JStfJMzy8m6O0000 2025-02-25/h5ad/plate11_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 23230802756 N2mzoYlMLEl6PdecaYyDvw None 7435869 md5 False False 1 2 3 None True 1 2025-02-25 23:22:18.229629+00:00 1 None 1
# check the curated metadata of the first artifact
artifact1 = collection.artifacts.all()[0]
artifact1.describe()
Hide code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
Artifact .h5ad/AnnData
├── General
│   ├── .uid = '56uA9lPPmJ4zLUcr0000'
│   ├── .key = '2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WServicesFrom_ParseGigalab.h5ad'
│   ├── .size = 26536400717
│   ├── .hash = 'j1FXsX7hs7u+eBqnWnmNHw'
│   ├── .n_observations = 8044908
│   ├── .path = gs://arc-ctc-tahoe100/2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WServicesFrom_ParseGigalab.h5ad
│   ├── .created_by = sunnyosun (Sunny Sun)
│   ├── .created_at = 2025-02-25 23:22:17
│   └── .transform = 'Register Tahoe-100M'
├── Dataset features/schema
│   ├── var62710                 [bionty.Gene.stable_id]                                             
│   │   TSPAN6                      float                                                               
│   │   TNMD                        float                                                               
│   │   DPM1                        float                                                               
│   │   SCYL3                       float                                                               
│   │   C1orf112                    float                                                               
│   │   FGR                         float                                                               
│   │   CFH                         float                                                               
│   │   FUCA2                       float                                                               
│   │   GCLC                        float                                                               
│   │   NFYA                        float                                                               
│   │   STPG1                       float                                                               
│   │   NIPAL3                      float                                                               
│   │   LAS1L                       float                                                               
│   │   ENPP4                       float                                                               
│   │   SEMA3F                      float                                                               
│   │   CFTR                        float                                                               
│   │   ANKIB1                      float                                                               
│   │   CYP51A1                     float                                                               
│   │   KRIT1                       float                                                               
│   │   RAD52                       float                                                               
│   └── obs16                    [Feature]                                                           
cell_line                   cat[bionty.CellLine.onto…  A-172, A-427, A498, A549, AN3 CA, AsPC-1…
cell_name                   cat[bionty.CellLine]       A-172, A-427, A498, A549, AN3 CA, AsPC-1…
drug                        cat[wetlab.Compound]       5-Azacytidine, 5-Fluorouracil, Abiratero…
drugname_drugconc           cat[wetlab.CompoundPertu…  [('5-Azacytidine', 0.05, 'uM')], [('5-Fl…
pass_filter                 cat[ULabel[PassFilter]]    full, minimal                            
phase                       cat[ULabel[Phase]]         G1, G2M, S                               
plate                       cat[ULabel[Plate]]         plate10                                  
sample                      cat[wetlab.Biosample]      smp_2359, smp_2360, smp_2361, smp_2362, …
gene_count                  int                                                                 
tscp_count                  int                                                                 
mread_count                 int                                                                 
pcnt_mito                   float                                                               
S_score                     float                                                               
G2M_score                   float                                                               
sublibrary                  str                                                                 
BARCODE                     str                                                                 
└── Labels
    └── .projects                   Project                    Tahoe-100M                               
        .references                 Reference                  Tahoe-100M: A Giga-Scale Single-Cell Per…
        .organisms                  bionty.Organism            human                                    
        .cell_lines                 bionty.CellLine            Panc 03.27, PANC-1, NCI-H460, HEC-1-A, M…
        .compounds                  wetlab.Compound            Acetazolamide, Neratinib, Tazarotene, 5-…
        .compound_perturbations     wetlab.CompoundPerturbat…  [('5-Azacytidine', 0.05, 'uM')], [('Iver…
        .biosamples                 wetlab.Biosample           smp_2430, smp_2365, smp_2360, smp_2369, …
        .ulabels                    ULabel                     tahoe-100, plate10, G1, G2M, S, full, mi…

Query artifacts of interest based on metadata

Let’s find which datasets contain A549 cells perturbed with Piroxicam.

drugs = wl.Compound.lookup()
cell_lines = bt.CellLine.lookup()

collection.artifacts.filter(compounds=drugs.piroxicam, cell_lines=cell_lines.a549).df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1362 56uA9lPPmJ4zLUcr0000 2025-02-25/h5ad/plate10_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 26536400717 j1FXsX7hs7u+eBqnWnmNHw None 8044908 md5 False False 1 2 3 None True 1 2025-02-25 23:22:17.849980+00:00 1 None 1
1363 omn7JStfJMzy8m6O0000 2025-02-25/h5ad/plate11_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 23230802756 N2mzoYlMLEl6PdecaYyDvw None 7435869 md5 False False 1 2 3 None True 1 2025-02-25 23:22:18.229629+00:00 1 None 1
1364 S2h2rPLCaUhZAM9u0000 2025-02-25/h5ad/plate12_filt_Vevo_Tahoe100M_WS... None .h5ad dataset AnnData 37495736876 VjAkWVFGVpzAMi9Innusuw None 10487057 md5 False False 1 2 3 None True 1 2025-02-25 23:22:18.600910+00:00 1 None 1

You can download an .h5ad into your local cache:

artifact1.cache()

Or stream it:

artifact1.open()

Open the obs metadata parquet file as a PyArrow Dataset

Open the obs metadata file (2.29G) with PyArrow.Dataset.

ulabels = ln.ULabel.lookup()
parquet_artifact = ln.Artifact.filter(
    key__contains="obs_metadata.parquet", ulabels=ulabels.tahoe_100
).one()
parquet_artifact
Artifact(uid='y1TTR9wbrmZEwpOa0000', is_latest=True, key='2025-02-25/metadata/obs_metadata.parquet', suffix='.parquet', otype='DataFrame', size=2293981573, hash='qEWOpGw9CmQVzaElyMWT1Q', n_observations=100648790, space_id=1, storage_id=2, run_id=1, created_by_id=1, created_at=2025-02-25 19:33:42 UTC)
dataset = parquet_artifact.open()
dataset.schema
Hide code cell output
! run input wasn't tracked, call `ln.track()` and re-run
plate: string
BARCODE_SUB_LIB_ID: string
sample: string
gene_count: int64
tscp_count: int64
mread_count: int64
drugname_drugconc: string
drug: string
cell_line: dictionary<values=string, indices=int32, ordered=0>
sublibrary: string
BARCODE: string
pcnt_mito: float
S_score: double
G2M_score: double
phase: dictionary<values=string, indices=int32, ordered=0>
pass_filter: dictionary<values=string, indices=int32, ordered=0>
cell_name: dictionary<values=string, indices=int32, ordered=0>
__index_level_0__: int64
-- schema metadata --
pandas: '{"index_columns": ["__index_level_0__"], "column_indexes": [{"na' + 2487

Which A549 cells are perturbed with Piroxicam.

filter_expr = (pc.field("drug") == drugs.piroxicam.name) & (
    pc.field("cell_name") == cell_lines.a549.name
)
df = dataset.scanner(filter=filter_expr).to_table().to_pandas()
df.value_counts("plate")
plate
plate12    2818
plate10    2812
plate11    2279
Name: count, dtype: int64
df.head()
plate BARCODE_SUB_LIB_ID sample gene_count tscp_count mread_count drugname_drugconc drug cell_line sublibrary BARCODE pcnt_mito S_score G2M_score phase pass_filter cell_name
29314 plate10 50_030_183-lib_1681 smp_2408 644 863 1024 [('Piroxicam', 0.05, 'uM')] Piroxicam CVCL_0023 lib_1681 50_030_183 0.101970 -0.282297 -0.165568 G1 full A549
29337 plate10 50_035_135-lib_1681 smp_2408 1130 1570 1827 [('Piroxicam', 0.05, 'uM')] Piroxicam CVCL_0023 lib_1681 50_035_135 0.077070 -0.335042 -0.280220 G1 full A549
29338 plate10 50_035_171-lib_1681 smp_2408 1058 1534 1809 [('Piroxicam', 0.05, 'uM')] Piroxicam CVCL_0023 lib_1681 50_035_171 0.124511 -0.402028 -0.404579 G1 full A549
29352 plate10 50_038_157-lib_1681 smp_2408 1265 1883 2240 [('Piroxicam', 0.05, 'uM')] Piroxicam CVCL_0023 lib_1681 50_038_157 0.147106 -0.455343 -0.311355 G1 full A549
29355 plate10 50_039_078-lib_1681 smp_2408 1355 1914 2258 [('Piroxicam', 0.05, 'uM')] Piroxicam CVCL_0023 lib_1681 50_039_078 0.070010 -0.349396 0.186264 G2M full A549

TBD