Using R to search and replace text in an Excel xml file - Stack Overflow

I primarily use R for statistics. I can also use it (installed) on our secure system. I generate spread

I primarily use R for statistics. I can also use it (installed) on our secure system. I generate spreadsheets with keyed information - then "unkey" when it's moved to the secure system. It is extremely labor intensive to replace known keys with unkeyed information. I thought it would be relatively easy in R using a function (and looping through all excel files from a given project). I am having difficulty editing the xml file holding the keyed info. I can "find" this file by unzipping the xlsx file. I can see the elements in R

> doc <- xmlTreeParse("sharedStrings.xml", useInternalNodes = TRUE)

This gives me the xml code with the strings I want to replace with real values (Sorry! I can't paste here because it interprets the xml!).

So - I would like to have my function find "DDR022-02-S2" for instance and replace it with real unkeyed value. I could then save it over the original xml, then rezip to xlsx and not loose all the formatting and other information in the spreadsheet.

I have tried numerous examples here and I don't know if it's my syntax or the schema or what, but using xml_find_all, I can't seem to find any text nodes, so I can seem to use gsub to change values in the file (unexpected node type). I tried converting using xml_text but it cannot coerce type.... Anyway, I've been at this for almost 2 days and I suspect the answer is less complicated than I'm making it, but I don't routinely (or ever) use or parse xml. I'm decent with Matlab and R, but I really could use help figuring out what R function(ality) could search this xml file and replace a given text string?

I have manually edited using notepad++ and I can rezip and the sheet looks like it should (with the replaced text). With many, many files, doing this in R would save many, many hours!

Thanks for any help!

I primarily use R for statistics. I can also use it (installed) on our secure system. I generate spreadsheets with keyed information - then "unkey" when it's moved to the secure system. It is extremely labor intensive to replace known keys with unkeyed information. I thought it would be relatively easy in R using a function (and looping through all excel files from a given project). I am having difficulty editing the xml file holding the keyed info. I can "find" this file by unzipping the xlsx file. I can see the elements in R

> doc <- xmlTreeParse("sharedStrings.xml", useInternalNodes = TRUE)

This gives me the xml code with the strings I want to replace with real values (Sorry! I can't paste here because it interprets the xml!).

So - I would like to have my function find "DDR022-02-S2" for instance and replace it with real unkeyed value. I could then save it over the original xml, then rezip to xlsx and not loose all the formatting and other information in the spreadsheet.

I have tried numerous examples here and I don't know if it's my syntax or the schema or what, but using xml_find_all, I can't seem to find any text nodes, so I can seem to use gsub to change values in the file (unexpected node type). I tried converting using xml_text but it cannot coerce type.... Anyway, I've been at this for almost 2 days and I suspect the answer is less complicated than I'm making it, but I don't routinely (or ever) use or parse xml. I'm decent with Matlab and R, but I really could use help figuring out what R function(ality) could search this xml file and replace a given text string?

I have manually edited using notepad++ and I can rezip and the sheet looks like it should (with the replaced text). With many, many files, doing this in R would save many, many hours!

Thanks for any help!

Share Improve this question asked Mar 11 at 14:23 user2299029user2299029 313 bronze badges 4
  • 1 If you need something hastily and if the strings to be replaced are unambiguous, you should be able to deal with the .xml file as a text file, using readLines() |> stringr::str_replace_all(..) |> writeLines() or similar. – r2evans Commented Mar 11 at 14:36
  • Tim G - you gave an example of what I thought I am doing. Not sure why, the the xml from the excel file always offers zero text nodes. I copied your code and sure enough, there are 3 text nodes. The only main difference I see is that the excel file is "bound" by <sst></sst> rather than <root></root>. Don't see how I can upload the file(s) or screenshot(s) now. Is there some schema setting that makes the xml_find_all ignore the document "type". Is there a schema "setting" I'm missing???? – user2299029 Commented Mar 11 at 17:20
  • 1 Ahah! Looks like there is some issue - a namespace that needs to be included.... found it here: stackoverflow/questions/64243628/…. xml_find_all(doc, "//d1:t") finds the text nodes! Ugh! THanks. – user2299029 Commented Mar 11 at 17:58
  • yes! that's it. xmlns adds a namespace (d1). You can change it if you want with ns <- xml_ns_rename(xml_ns(doc), d1 = "ns"). So this is your solution library(xml2); xml_doc <- read_xml("<sst xmlns='https://schemas.openxmlformats./spreadsheetml/2006/main' count='12' uniqueCount='6'><si><t>DDR022-02-S2</t></si><si><t>D2232-15-S1</t></si><si><t>MP223-21-S2</t></si></sst>"); text_nodes <- xml_find_all(xml_doc, "//d1:t", xml_ns(xml_doc)); xml_text(text_nodes) <- gsub("DDR022-02-S2", "replacement", xml_text(text_nodes), fixed = TRUE); cat(as.character(xml_doc)) – Tim G Commented Mar 11 at 18:01
Add a comment  | 

1 Answer 1

Reset to default 1

I know too little about xml.... Found this: stackoverflow/questions/64243628/….

Looks like I had a namespace...

xml_ns(doc) output "d1"

now running 
>xml_find_all(doc, "//d1:t") #finds the text nodes! 

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744788621a4593778.html

相关推荐

  • Using R to search and replace text in an Excel xml file - Stack Overflow

    I primarily use R for statistics. I can also use it (installed) on our secure system. I generate spread

    2天前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信