Empathy List Archives

pdb-l@lists.wwpdb.org

The PDB mailing list

View all threads

Download limitations

Bernhard Rupp

Fri, Nov 24, 2023 6:43 PM

Dear Wardens of the Structure Treasures,

I am using

https://www.rcsb.org/search/advanced

for data mining of basic pdb records, which means there are often in the
order of 10^4 to 10^5 results.

The cvs tables are then broken into smaller blocks of 2500, and I need to
download them piecemeal and then concatenate.

As all cvs files need to be created server-side in any case by the jscript,
I see no pressing reason not to provide a download option of the entire
search results in one (compressed) file?

Thx, BR

Bernhard Rupp

k.k. Hofkristallamt

http://www.hofkristallamt.org/ http://www.hofkristallamt.org/

mailto:br@hofkristallamt.org br@hofkristallamt.org

+1 925 209 7429

Institute of Genetic Epidemiology

Medical University Innsbruck

Schöpfstr. 41

A 6020 Innsbruck

mailto:bernhard.rupp@i-med.ac.at bernhard.rupp@i-med.ac.at

+43 676 571 0536

Many plausible ideas vanish

at the presence of thought

Dear Wardens of the Structure Treasures, I am using https://www.rcsb.org/search/advanced for data mining of basic pdb records, which means there are often in the order of 10^4 to 10^5 results. The cvs tables are then broken into smaller blocks of 2500, and I need to download them piecemeal and then concatenate. As all cvs files need to be created server-side in any case by the jscript, I see no pressing reason not to provide a download option of the entire search results in one (compressed) file? Thx, BR ------------------------------------------------------ Bernhard Rupp k.k. Hofkristallamt <http://www.hofkristallamt.org/> http://www.hofkristallamt.org/ <mailto:br@hofkristallamt.org> br@hofkristallamt.org +1 925 209 7429 ------------------------------------------------------ Institute of Genetic Epidemiology Medical University Innsbruck Schöpfstr. 41 A 6020 Innsbruck <mailto:bernhard.rupp@i-med.ac.at> bernhard.rupp@i-med.ac.at +43 676 571 0536 ------------------------------------------------------ Many plausible ideas vanish at the presence of thought ------------------------------------------------------

Yana Valasatava

Sun, Nov 26, 2023 6:27 PM

Hello Bernhard,

When it comes to retrieving a large amount of data, using APIs is a more effective approach. You can find a comprehensive introduction to the RCSB PDB APIs in this crash course video https://youtu.be/ZyCQeentCtw?si=JowT4QVdURF167x_.

Other useful resources:
Search API tutorial https://search.rcsb.org/#search-api
Data API tutorial https://data.rcsb.org/#data-api

Please, do not hesitate to reach out to our Help Desk if you have any questions: info@rcsb.org mailto:info@rcsb.org

Regards,
Yana Rose

——————————————————————
Yana Rose, Ph.D.
Scientific Software Developer & Project Manager
RCSB Protein Data Bank
University of California San Diego
10100 Hopkins Dr, La Jolla, CA 92093

On Nov 24, 2023, at 10:43 AM, Bernhard Rupp via pdb-l pdb-l@lists.wwpdb.org wrote:

Dear Wardens of the Structure Treasures,

I am using

https://www.rcsb.org/search/advanced

for data mining of basic pdb records, which means there are often in the
order of 10^4 to 10^5 results.

The cvs tables are then broken into smaller blocks of 2500, and I need to
download them piecemeal and then concatenate.

As all cvs files need to be created server-side in any case by the jscript,
I see no pressing reason not to provide a download option of the entire
search results in one (compressed) file?

Thx, BR

Bernhard Rupp

k.k. Hofkristallamt

http://www.hofkristallamt.org/ http://www.hofkristallamt.org/

mailto:br@hofkristallamt.org br@hofkristallamt.org

+1 925 209 7429

Institute of Genetic Epidemiology

Medical University Innsbruck

Schöpfstr. 41

A 6020 Innsbruck

mailto:bernhard.rupp@i-med.ac.at bernhard.rupp@i-med.ac.at

+43 676 571 0536

Many plausible ideas vanish

at the presence of thought

The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at:
https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org

To subscribe via email, send a message with subject or body 'subscribe' to:
pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

To unsubscribe via email, send a message with subject or body 'unsubscribe' to:
pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

Hello Bernhard, When it comes to retrieving a large amount of data, using APIs is a more effective approach. You can find a comprehensive introduction to the RCSB PDB APIs in this crash course video <https://youtu.be/ZyCQeentCtw?si=JowT4QVdURF167x_>. Other useful resources: Search API tutorial <https://search.rcsb.org/#search-api> Data API tutorial <https://data.rcsb.org/#data-api> Please, do not hesitate to reach out to our Help Desk if you have any questions: info@rcsb.org <mailto:info@rcsb.org> Regards, Yana Rose —————————————————————— Yana Rose, Ph.D. Scientific Software Developer & Project Manager RCSB Protein Data Bank University of California San Diego 10100 Hopkins Dr, La Jolla, CA 92093 > On Nov 24, 2023, at 10:43 AM, Bernhard Rupp via pdb-l <pdb-l@lists.wwpdb.org> wrote: > > Dear Wardens of the Structure Treasures, > > > > I am using > > https://www.rcsb.org/search/advanced > > for data mining of basic pdb records, which means there are often in the > order of 10^4 to 10^5 results. > > The cvs tables are then broken into smaller blocks of 2500, and I need to > download them piecemeal and then concatenate. > > As all cvs files need to be created server-side in any case by the jscript, > I see no pressing reason not to provide a download option of the entire > search results in one (compressed) file? > > > > Thx, BR > > > > ------------------------------------------------------ > > Bernhard Rupp > > k.k. Hofkristallamt > > <http://www.hofkristallamt.org/> http://www.hofkristallamt.org/ > > <mailto:br@hofkristallamt.org> br@hofkristallamt.org > > +1 925 209 7429 > > ------------------------------------------------------ > > Institute of Genetic Epidemiology > > Medical University Innsbruck > > Schöpfstr. 41 > > A 6020 Innsbruck > > <mailto:bernhard.rupp@i-med.ac.at> bernhard.rupp@i-med.ac.at > > +43 676 571 0536 > > ------------------------------------------------------ > > Many plausible ideas vanish > > at the presence of thought > > ------------------------------------------------------ > > > > The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at: > https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org > > To subscribe via email, send a message with subject or body 'subscribe' to: > pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email. > > To unsubscribe via email, send a message with subject or body 'unsubscribe' to: > pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email.

Jose Duarte

Sun, Nov 26, 2023 6:32 PM

The tabular reports and associated CSV-format downloads are intended as an
exploratory tool and as a way to get small batches of data for quick
analysis.

The recommended way to get larger volumes of data is via the RCSB PDB APIs,
mostly Search API and Data API (see
https://www.rcsb.org/docs/programmatic-access/web-services-overview). The
APIs provide much more flexibility and power than the tabular report UI and
have no limitations in how much data can be downloaded. The output format
is JSON, which can express the one-to-many relations in the data more
accurately than CSV.

Please also note that the rcsb.org user interface provides a way to
facilitate writing API queries via the "Data API" and "Search API" buttons
(with a cog-wheel icon) on top right of many pages. Clicking the button
will give you either the Search API query that produced the search results
that you see on the screen or the Data API query to retrieve all the data
you see in a structure summary page.

Hope this helps

Jose

On Fri, 24 Nov 2023 at 10:44, Bernhard Rupp via pdb-l pdb-l@lists.wwpdb.org
wrote:

Dear Wardens of the Structure Treasures,

I am using

https://www.rcsb.org/search/advanced

for data mining of basic pdb records, which means there are often in the
order of 10^4 to 10^5 results.

The cvs tables are then broken into smaller blocks of 2500, and I need to
download them piecemeal and then concatenate.

As all cvs files need to be created server-side in any case by the jscript,
I see no pressing reason not to provide a download option of the entire
search results in one (compressed) file?

Thx, BR

Bernhard Rupp

k.k. Hofkristallamt

http://www.hofkristallamt.org/ http://www.hofkristallamt.org/

mailto:br@hofkristallamt.org br@hofkristallamt.org

+1 925 209 7429

Institute of Genetic Epidemiology

Medical University Innsbruck

Schöpfstr. 41

A 6020 Innsbruck

mailto:bernhard.rupp@i-med.ac.at bernhard.rupp@i-med.ac.at

+43 676 571 0536

Many plausible ideas vanish

at the presence of thought

The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at:
https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org

To subscribe via email, send a message with subject or body 'subscribe' to:
pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

To unsubscribe via email, send a message with subject or body
'unsubscribe' to:
pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

The tabular reports and associated CSV-format downloads are intended as an exploratory tool and as a way to get small batches of data for quick analysis. The recommended way to get larger volumes of data is via the RCSB PDB APIs, mostly Search API and Data API (see https://www.rcsb.org/docs/programmatic-access/web-services-overview). The APIs provide much more flexibility and power than the tabular report UI and have no limitations in how much data can be downloaded. The output format is JSON, which can express the one-to-many relations in the data more accurately than CSV. Please also note that the rcsb.org user interface provides a way to facilitate writing API queries via the "Data API" and "Search API" buttons (with a cog-wheel icon) on top right of many pages. Clicking the button will give you either the Search API query that produced the search results that you see on the screen or the Data API query to retrieve all the data you see in a structure summary page. Hope this helps Jose On Fri, 24 Nov 2023 at 10:44, Bernhard Rupp via pdb-l <pdb-l@lists.wwpdb.org> wrote: > Dear Wardens of the Structure Treasures, > > > > I am using > > https://www.rcsb.org/search/advanced > > for data mining of basic pdb records, which means there are often in the > order of 10^4 to 10^5 results. > > The cvs tables are then broken into smaller blocks of 2500, and I need to > download them piecemeal and then concatenate. > > As all cvs files need to be created server-side in any case by the jscript, > I see no pressing reason not to provide a download option of the entire > search results in one (compressed) file? > > > > Thx, BR > > > > ------------------------------------------------------ > > Bernhard Rupp > > k.k. Hofkristallamt > > <http://www.hofkristallamt.org/> http://www.hofkristallamt.org/ > > <mailto:br@hofkristallamt.org> br@hofkristallamt.org > > +1 925 209 7429 > > ------------------------------------------------------ > > Institute of Genetic Epidemiology > > Medical University Innsbruck > > Schöpfstr. 41 > > A 6020 Innsbruck > > <mailto:bernhard.rupp@i-med.ac.at> bernhard.rupp@i-med.ac.at > > +43 676 571 0536 > > ------------------------------------------------------ > > Many plausible ideas vanish > > at the presence of thought > > ------------------------------------------------------ > > > > The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at: > https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org > > To subscribe via email, send a message with subject or body 'subscribe' to: > pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email. > > To unsubscribe via email, send a message with subject or body > 'unsubscribe' to: > pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email. >