Easiest scripting method to merge two text files - Ruby, Python, JavaScript, Java? - Stack Overflow

I have two text files, one containing HTML and the other containing URL slugs:FILE 1 (HTML):<li>&

I have two text files, one containing HTML and the other containing URL slugs:

FILE 1 (HTML):

<li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>
...

FILE 2 (URL SLUGS):

thomas-friedman-the-world-is-flat
michael-dagleish-scotland-in-wartime
dr-raymond-kinsella-progress-in-cancer-treatments
...

I need to merge them so that the slugs in FILE 2 are inserted into the HTML in FILE 1 like this:

OUTPUT:

<li><a href="/article/thomas-friedman-the-world-is-flat"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/michael-dagleish-scotland-in-wartime"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/dr-raymond-kinsella-progress-in-cancer-treatments"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>

What's the best approach and which language would be most appropriate to acplish this task with a minimum of plexity?

I have two text files, one containing HTML and the other containing URL slugs:

FILE 1 (HTML):

<li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>
...

FILE 2 (URL SLUGS):

thomas-friedman-the-world-is-flat
michael-dagleish-scotland-in-wartime
dr-raymond-kinsella-progress-in-cancer-treatments
...

I need to merge them so that the slugs in FILE 2 are inserted into the HTML in FILE 1 like this:

OUTPUT:

<li><a href="/article/thomas-friedman-the-world-is-flat"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/michael-dagleish-scotland-in-wartime"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/dr-raymond-kinsella-progress-in-cancer-treatments"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>

What's the best approach and which language would be most appropriate to acplish this task with a minimum of plexity?

Share Improve this question asked Dec 18, 2010 at 0:15 fidlrzfidlrz 231 silver badge3 bronze badges 4
  • I'd do it in perl, but only because I have 10+ years of experience in perl. – Paul Tomblin Commented Dec 18, 2010 at 0:17
  • Are the files guaranteed to be in the same order? – Katriel Commented Dec 18, 2010 at 0:31
  • 1 I'd do it in perl, but only because perl would be really good for this problem. I have 10+ minutes of experience in perl. – Cameron Skinner Commented Dec 18, 2010 at 0:33
  • @katrielalex: Good question. Yes, the files match up, line for line. – fidlrz Commented Dec 18, 2010 at 0:40
Add a ment  | 

6 Answers 6

Reset to default 6

You need zip-function, which is available in most languages. It's purpose is parallel processing of two or more arrays.
In Ruby it will be something like this:

f1 = File.readlines('file1.txt')
f2 = File.readlines('file2.txt')

File.open('file3.txt','w') do |output_file|

    f1.zip(f2) do |a,b|
        output_file.puts a.sub('/article/','/article/'+b)
    end

end

For zipping more, than two arrays you can do f1.zip(f2,f3,...) do |a,b,c,...|

This will be easy in any language. Here it is in pseudo-Python; I've omitted the lxml bits because I don't have access to them and I can't quite remember the syntax. They're not difficult, though.

with open(...) as htmls, open(...) as slugs, open(...) as output:
    for html, slug in zip(htmls, slugs):
        root = lxml.etree.fromstring(html)
        # do some fiddling with lxml to get the name

        slug = slug.split("-")[(len(name.split()):]
        # add in the extra child in lxml

        output.write(root.tostring())

Interesting features:

  • This doesn't read in the entire file at once; it does it chunk by chunk (well, line-by-line but Python will buffer it). Useful if the files are huge, but probably irrelevant.

  • lxml may be overkill, depending on how rigid the format of the html strings is. If they're guaranteed to be the same and all well-formed, it might be easier for you to use simple string operations. On the other hand, lxml is pretty fast and offers a lot more flexibility.

Ruby one liner:

File.open("joined.txt","w") { |f| f.puts ['file1.txt', 'file2.txt'].map{ |s| IO.read(s) }}

The easiest way to do this is to use the language of the listed ones that you are most familiar with. Even if it doesn't produce the neatest solution, you'll get the job done with the least (mental) effort.

If you know none of them, then Perl is a good option because this is the kind of thing it was designed to do. (I'm assuming that you understand regular expressions ...) And by the look of some of the other answers, Python is a good option too.

Python is great language Just have a look at these six lines of python they can merge any big text file, just now i have merged 2 text file of 10 GB each.

 o = open("E:/temp/3.txt","wb") #open for write
 for line in open("E:/temp/1.txt","rb"):
     o.write(line)
 for line in open("E:/temp/2.txt","rb"):
     o.write(line)
 o.close()

PHP is the easiest!

$firstFile = file('file1.txt');
$secodFile = file('file2.txt');

$findKey='/article/';
$output='';

if (count($firstFile)==count($secodFile)) 
                    or die('record counts dont match');

for($i=0;$i<count($firstFile);$i++)
{
    $output.=str_replace($findKey,$findKey.trim($secodFile[$i]),$firstFile[$i]);
}

file_put_contents('output.txt',$output);

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1741257779a4335306.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信