C++ reading info from a webpage

Programming, for all ages and all languages.
User avatar
VolTeK
Member
Member
Posts: 815
Joined: Sat Nov 15, 2008 2:37 pm
Location: The Fire Nation

C++ reading info from a webpage

Post by VolTeK »

i want my program to read just a line of information from a cite, for example a "daily update" and just have my program copy that information like this


www.examplesite.com

webpage:

Daily Update: First Day Of Program Release

end of webpage
thats all the site willl display

how can i get my C++ win32 program to read that line from that site?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: C++ reading info from a webpage

Post by Combuster »

1: Implement the HTTP protocol to download the page in question
2: Parse the XML to get the wanted data.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: C++ reading info from a webpage

Post by Thomas »

Combuster wrote:Re: C++ reading info from a webpage
1: Implement the HTTP protocol to download the page in question
2: Parse the XML to get the wanted data.
That's rather far fetched :wink: . See : http://www.w3.org/Library/

--Thomas
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: C++ reading info from a webpage

Post by Combuster »

Where did I say that there wasn't a library that does most of that for you. :wink:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: C++ reading info from a webpage

Post by Thomas »

Hi,
Yeah .. that's right :mrgreen:
Program to an interface not an implementation
--Thomas
Gigasoft
Member
Member
Posts: 856
Joined: Sat Nov 21, 2009 5:11 pm

Re: C++ reading info from a webpage

Post by Gigasoft »

On Windows, either use the WinInet API or URLDownloadToCacheFile.

If the data is formatted as HTML, you can use the MSHTML component to manipulate it.
User avatar
VolTeK
Member
Member
Posts: 815
Joined: Sat Nov 15, 2008 2:37 pm
Location: The Fire Nation

Re: C++ reading info from a webpage

Post by VolTeK »

Thank you very much
dak91
Member
Member
Posts: 43
Joined: Thu Mar 12, 2009 3:27 am
Location: Sardegna (IT)

Re: C++ reading info from a webpage

Post by dak91 »

I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: C++ reading info from a webpage

Post by Solar »

Not being (completely) serious here:

Code: Select all

#include <stdlib.h>
#include <stdio.h>

#define MAXLEN 200

int main()
{
    FILE * input;
    char infoline[ MAXLEN ];
    system( "wget http://www.examplesite.com/index.html" );
    input = fopen( "index.html", "r" );
    fgets( infoline, MAXLEN, input );
    fclose( input );
    remove( "index.html" );
    puts( infoline );
    return 0;
}
Sorry, I just wanted to write a piece of code. 8)
Every good solution is obvious once you've found it.
fronty
Member
Member
Posts: 188
Joined: Mon Jan 14, 2008 5:53 am
Location: Helsinki

Re: C++ reading info from a webpage

Post by fronty »

dak91 wrote:I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp
Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.

I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
dak91
Member
Member
Posts: 43
Joined: Thu Mar 12, 2009 3:27 am
Location: Sardegna (IT)

Re: C++ reading info from a webpage

Post by dak91 »

fronty wrote:
dak91 wrote:I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp
Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.

I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
I made that code 2 years ago, but I remember that it compile correctly...

anyway thanks for the correction about the api name
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: C++ reading info from a webpage

Post by Owen »

dak91 wrote:
fronty wrote:
dak91 wrote:I made this simple url_get function in C/Linux socket

http://www.inventati.org/dak/src/c/geturl.cpp
Did you even try to compile that? It doesn't compile, it isn't written in C and more correct name of the API is Berkeley sockets.

I made this simple version in C, uses getaddrinfo(3) (better version of gethostbyname(3) and getserbyname(3)). Tested on FreeBSD/amd64 and NetBSD/sparc. link
I made that code 2 years ago, but I remember that it compile correctly...

anyway thanks for the correction about the api name
Reading that code...

Code: Select all

using namespace std;
OMG. Bringing in craptonne of unknown crap. Crazy.

Code: Select all

	for(int x=0;x<strlen(serverd.c_str());x++){ 
We've never heard of std::string::length() now? Which is far faster?

Oh, and were executing strlen every freaking time through the loop

Code: Select all

		if(serverd.c_str()[x] == '/'){ 
Wait, so were reimplementing std::string::find/strchr now?

Code: Select all

			y = x; 
			break; 
Again with the one letter variables. I'm glad nobody is trying to read your code.

Oh, wait...

Code: Select all

	if(y!=0){
		data = server = "";
		for(int x=y;x<strlen(serverd.c_str());x++){ data += serverd.c_str()[x]; }
		for(int x=0;x<y;x++){ server += serverd.c_str()[x]; }
	}
Yay! Lets poorly reimplement std::string's substring constructor

Code: Select all

	if(connect(y,(struct sockaddr*) &server_addr, sizeof(server_addr))  != 0){
		cout<<"Cannot connect...\n";
		return ""; 
	}
Oh, I encountered an error. Lets print it. Not, you know, return it to the caller. Not emit it to the error stream either.

Code: Select all

		string get_request = "GET "+data+" HTTP/1.1\n\n\n\n";
Yargh! Lets build an invalid HTTP/1.1 request (For a start, you're missing the required Host: header. And you definitely want that one, too. I mean, it would be such a shame if 99% of websites didn't work.

Oops.

Code: Select all

		send(y, get_request.c_str(), strlen(get_request.c_str()), 0);
Lets pretend that the OS will always send my data in one go.

Oh wait, it won't...

Code: Select all

		char dat[10000];
		for(int x=0;x<10000;x++){ dat[x] = '\0'; }
Wait, we are reinventing memset now?!

And what if my page is bigger than 10kb? Did dynamically allocated buffers go out of style?

I mean, we are just forgetting that std::string and std::stringstream are part of the language? :-(

Code: Select all

		recv(y, dat, 10000, 0);
		close(y);	
Lets pretend the OS will always return the page in one go...

Code: Select all

		y = 0;
		char buf;
		while(buf!='\0'){
			buf = dat[y];
			data += buf;
			y++;
		}
		return data;
Again with the reimplementing basic functionality badly

Incidentally, I notice that fronty's also makes the "send/recv will always do everything in one go" assumption, but is on the whole at least much cleaner.

Its ironic, but Solar's is the only one which works properly.

It's not like me to tear into people's code like this, but theres code with problems and code which is plain bad, and this falls into the latter category.

Seriously people, just use QNetworkAccessManager, or libwww, or libcurl.
fronty
Member
Member
Posts: 188
Joined: Mon Jan 14, 2008 5:53 am
Location: Helsinki

Re: C++ reading info from a webpage

Post by fronty »

Owen wrote:Incidentally, I notice that fronty's also makes the "send/recv will always do everything in one go" assumption, but is on the whole at least much cleaner.
Damn, should've read it couple times more. :D Not enough network programming for me in last years.
dak91
Member
Member
Posts: 43
Joined: Thu Mar 12, 2009 3:27 am
Location: Sardegna (IT)

Re: C++ reading info from a webpage

Post by dak91 »

Owen wrote: Reading that code...

Code: Select all

using namespace std;
OMG. Bringing in craptonne of unknown crap. Crazy.

Code: Select all

	for(int x=0;x<strlen(serverd.c_str());x++){ 
We've never heard of std::string::length() now? Which is far faster?

Oh, and were executing strlen every freaking time through the loop

Code: Select all

		if(serverd.c_str()[x] == '/'){ 
Wait, so were reimplementing std::string::find/strchr now?

Code: Select all

			y = x; 
			break; 
Again with the one letter variables. I'm glad nobody is trying to read your code.

Oh, wait...

Code: Select all

	if(y!=0){
		data = server = "";
		for(int x=y;x<strlen(serverd.c_str());x++){ data += serverd.c_str()[x]; }
		for(int x=0;x<y;x++){ server += serverd.c_str()[x]; }
	}
Yay! Lets poorly reimplement std::string's substring constructor

Code: Select all

	if(connect(y,(struct sockaddr*) &server_addr, sizeof(server_addr))  != 0){
		cout<<"Cannot connect...\n";
		return ""; 
	}
Oh, I encountered an error. Lets print it. Not, you know, return it to the caller. Not emit it to the error stream either.

Code: Select all

		string get_request = "GET "+data+" HTTP/1.1\n\n\n\n";
Yargh! Lets build an invalid HTTP/1.1 request (For a start, you're missing the required Host: header. And you definitely want that one, too. I mean, it would be such a shame if 99% of websites didn't work.

Oops.

Code: Select all

		send(y, get_request.c_str(), strlen(get_request.c_str()), 0);
Lets pretend that the OS will always send my data in one go.

Oh wait, it won't...

Code: Select all

		char dat[10000];
		for(int x=0;x<10000;x++){ dat[x] = '\0'; }
Wait, we are reinventing memset now?!

And what if my page is bigger than 10kb? Did dynamically allocated buffers go out of style?

I mean, we are just forgetting that std::string and std::stringstream are part of the language? :-(

Code: Select all

		recv(y, dat, 10000, 0);
		close(y);	
Lets pretend the OS will always return the page in one go...

Code: Select all

		y = 0;
		char buf;
		while(buf!='\0'){
			buf = dat[y];
			data += buf;
			y++;
		}
		return data;
I've made it when I just started programming
Gigasoft
Member
Member
Posts: 856
Joined: Sat Nov 21, 2009 5:11 pm

Re: C++ reading info from a webpage

Post by Gigasoft »

Send won't return until it has sent everything or it fails, unless the socket is set to non-blocking mode. Recv, however, may return less data than requested.
Post Reply