What happens if I save the same artifacts & records twice?¶

LaminDB’s operations are idempotent in the sense defined in this document.

This allows you to re-run a notebook or script without erroring or duplicating data. Similar behavior holds for human data entry.

Summary¶

Metadata records¶

If you try to create any metadata record (Record) and search_names is True (the default):

LaminDB will warn you if a record with similar name exists and display a table of similar existing records.
You can then decide whether you’d like to save a record to the database or rather query an existing one from the table.
If a name already has an exact match in a registry, LaminDB will return it instead of creating a new record. For versioned entities, also the version must be passed.

If you set search_names to False, you’ll directly populate the DB.

Data: artifacts & collections¶

If you try to create a Artifact object from the same content, you’ll get an existing artifact instead.

Examples¶

# !pip install 'lamindb[jupyter]'
!lamin init --storage ./test-idempotency

→ initialized lamindb: testuser1/test-idempotency

import lamindb as ln

ln.track("ANW20Fr4eZgM0000")

→ connected lamindb: testuser1/test-idempotency

→ created Transform('ANW20Fr4eZgM0000'), started new Run('IfgOiISK...') at 2025-02-20 07:27:29 UTC

→ notebook imports: lamindb==1.1.0

Metadata records¶

assert ln.settings.creation.search_names

Let us add a first record to the ULabel registry:

label = ln.ULabel(name="My project 1")
label.save()

ULabel(uid='C9E8gm9l', name='My project 1', is_type=False, created_by_id=1, run_id=1, space_id=1, created_at=2025-02-20 07:27:31 UTC)

If we create a new record, we’ll automatically get search results that give clues on whether we are prone to duplicating an entry:

label = ln.ULabel(name="My project 1a")

! record with similar name exists! did you mean to load it?

	uid	name	is_type	description	reference	reference_type	space_id	type_id	run_id	created_at	created_by_id	_aux	_branch_code
id
1	C9E8gm9l	My project 1	False	None	None	None	1	None	1	2025-02-20 07:27:31.644000+00:00	1	None	1

label.save()

ULabel(uid='LbTYJUMJ', name='My project 1a', is_type=False, created_by_id=1, run_id=1, space_id=1, created_at=2025-02-20 07:27:31 UTC)

In case we match an existing name directly, we’ll get the existing object:

label = ln.ULabel(name="My project 1")

→ returning existing ULabel record with same name: 'My project 1'

If we save it again, it will not create a new entry in the registry:

label.save()

ULabel(uid='C9E8gm9l', name='My project 1', is_type=False, created_by_id=1, run_id=1, space_id=1, created_at=2025-02-20 07:27:31 UTC)

Now, if we create a third record, we’ll get two alternatives:

label = ln.ULabel(name="My project 1b")

! records with similar names exist! did you mean to load one of them?

	uid	name	is_type	description	reference	reference_type	space_id	type_id	run_id	created_at	created_by_id	_aux	_branch_code
id
1	C9E8gm9l	My project 1	False	None	None	None	1	None	1	2025-02-20 07:27:31.644000+00:00	1	None	1
2	LbTYJUMJ	My project 1a	False	None	None	None	1	None	1	2025-02-20 07:27:31.712000+00:00	1	None	1

If we prefer to not perform a search, e.g. for performance reasons or too noisy logging, we can switch it off.

ln.settings.creation.search_names = False

label = ln.ULabel(name="My project 1c")

In this walkthrough, switch it back on:

ln.settings.creation.search_names = True

Data: artifacts and collections¶

filepath = ln.core.datasets.file_fcs()

Create an Artifact:

artifact = ln.Artifact(filepath, description="My fcs artifact").save()

Create an Artifact from the same path:

artifact2 = ln.Artifact(filepath, description="My fcs artifact")

→ found artifact with same hash: Artifact(uid='xcHbKwsyXWSXUO0r0000', is_latest=True, description='My fcs artifact', suffix='.fcs', size=19330507, hash='rCPvmZB19xs4zHZ7p_-Wrg', space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-02-20 07:27:32 UTC); to track this artifact as an input, use: ln.Artifact.get()

It gives us the existing object:

assert artifact.id == artifact2.id
assert artifact.run == artifact2.run
assert len(artifact._previous_runs.all()) == 0

If you save it again, nothing will happen (the operation is idempotent):

artifact2.save()

Artifact(uid='xcHbKwsyXWSXUO0r0000', is_latest=True, description='My fcs artifact', suffix='.fcs', size=19330507, hash='rCPvmZB19xs4zHZ7p_-Wrg', space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-02-20 07:27:32 UTC)

In the hidden cell below, you’ll see how this interplays with data lineage.

!rm -rf ./test-idempotency
!lamin delete --force test-idempotency