Parse PDF Metadata

hannesfrank · April 14, 2021, 10:31am

If a paper contains metadata RemNote already tries to parse it as best as it can. You can even copy&paste the pdf’s outline if it contains one.

One thing that does not work is authors that are separated by ;. They just end up as one rem:

Have you other pdf’s where the metadata parsing fails?

_yb · August 11, 2021, 2:43am

to retrieved metadata from PDF, I use this. I am OK with this system mainly because I have control on the result.

However, I often see that RemNote tries to retrieve metadata on my behalf. And more often than not, the metadata is not what I need, and it’s usually mistaken (the most common example being a list of Authors is registered as 1 author), and redundant with the system above.

Now I see this, which I don’t understand. Does RemNote create power up tags based on the metadata? I don’t get it. Does someone know what is going on? It’s very annoying.

Also : is there a way to stop Remnote trying to retrieve metadata altogether?

nukesean · August 15, 2021, 3:50am

Agreed! RemNote keeps creating error-filled rems every time I add a PDF (e.g., author names and keywords all being combined into single rems); and even if the parsing were correct, it would still not be what I wanted in many cases. For example, it automatically creates author names as “First Last” when I would want the author names to be stored as “Last, First” instead.

I’m also not entirely sure what the purpose of this is since I cannot seem to easily peruse a list of the items generated by these Power-Ups. For example, if I go directly to the “Keywords” Power-Up, it is completely empty—there are no search portals or anything inside. If I go up one level to the “File” Power-Up, however, I can see that the “Keywords” Power-Up has indeed been referenced several times. So then why is there no search portal inside listing these references? Perhaps there is a simple explanation that I am missing due to still being a newbie, but I cannot understand why it should behave any differently than any other rem when it comes to listing references.

I’m sure this will be a very useful feature if/when all the kinks get ironed out, but right now, all it’s doing is creating more work for me by forcing me to have to go back into each PDF, correct the information, and then manually delete all the extra rems it created in the process. So yeah, I reeeeally, really hope they provide a way for us to disable this automatic behavior. Even better would be a way to manually retrieve/correct the metadata via doi or some other identifier. Hey, a guy can dream, right? haha

At the very least, I really think this should be an optional setting under “Labs” since it’s clearly not ready for primetime yet. Indeed, it has literally never once gotten the metadata right for any PDF that it’s tried, unfortunately.

_yb · August 16, 2021, 1:25pm

I agree with you : we could have the automatic retrieval of PDFs’ metadata in lab feature until it’s actually working
Hello @hannesfrank, do you know if it is “easily” feasible? Sorry, if this shouldn’t be addressed to you

_yb · August 16, 2021, 1:27pm

You can use this meanwhile: